Abstract

We study the asymptotical properties of indefinite kernel network with -norm regularization. The framework under investigation is different from classical kernel learning. Positive semidefiniteness is not required by the kernel function. By a new step stone technique, without any interior cone condition for input space and condition for the probability measure , satisfied error bounds and learning rates are deduced.

1. Introduction

Regression learning has been widely applied in economic decision making, engineering, computer science, and especially statistics. To fix ideas, let the input data space be a compact domain of ; the output data space is a subset of , and is a Borel probability distribution on , which the random variable is drawn from. Let be the conditional distribution for , and is the marginal distribution. The goal of regression learning is to learn the regression function , which is given by

As the distribution is unknown, can not be calculated directly. In learning theory framework, one can get an approximation of from a set of observations identically and independently drawn according to . Least square scheme is the most popular method for regression problem. The generalization error for any measurable function , is the mean error that suffered from the use of as a model for the process producing from ; see Cucker and Zhou [1]. A simple computation shows that [1]Thus, can be used to value the learning effect for to approximate .

In order to minimize the expected risk functional, we employ the empirical risk functionalThe empirical risk minimization (ERM) principle is to approximate the function which minimizes the risk by the function which minimizes the empirical risk (4) in some hypothesis space . Kernel based machines usually take reproducing kernel Hilbert spaces (RKHS) as the hypothesis space, which is associated with a Mercer kernel . For definitions and properties, see [2]. The well-known regularized least square regression algorithm isThe penalty term with regularization parameter is to avoid ill-pose and overfitting. In the literature, a variety of consistence analysis have been done, for instance, the capacity dependent ones (e.g., [3, 4] and references therein) and the capacity independent ones (e.g., [58]).

In recent years, coefficient based regularization kernel network (CRKN) attracts more attentions:where and . In this setting, the hypothesis space is replaced by a finite dimension function space: where is only asked to be continuous and uniform bounded on and is called a general kernel. The penalty is imposed on the coefficients of function .

Some researches have been done for the mathematical foundation of CRKN. In [9], the frame of error analysis for this coefficient based regularization is proposed. CRKN with -norm regularizer and indefinite kernel is studied; error bound and learning rate are derived thorough explicit expression of solution and integral operator techniques, not only for independently sampling, but also for weak dependent sampling; see [10, 11].

CRKN with -norm regularizer attracts more attention, for it often leads to sparsity of the coefficient of with properly chosen regularization parameter . In [12], Xiao and Zhou analyzed the coefficient regularization using the norm. Under a Lipschitz condition of the kernel , the rate of order is obtained. It is very slow since is usually very large. In [13], under an interior cone condition for input space and condition for the probability measure , satisfied error bound and sharp convergence rate are obtained. The shortage for this analysis is that conditions are too strict, especially for condition, since it only includes the almost uniform distribution on . To the best of our knowledge, the probability measure has no direct relation with the consistency of algorithms. So we attempt to deduce satisfied error bound under more general conditions. To this end, we introduce a new step stone technique to the consistence analysis of CRKN with -norm regularization in [14].

In this paper, we adopt the step stone technique to study the norm based CRKN with , which is defined aswhere , and the norm

2. Assumptions and Main Results

Throughout the paper, we always assume that almost surely with some constant . By this assumption, we have that for any . Also we require the kernel with , and the norm is denoted by .

For the bounded indefinite kernel , we consider the Mercer kernelThe integral operator associated with a general kernel is defined by which is a compact positive operator. For the approximation ability of kernel based hypothesis space, we assume that , for some .

One of the main difficulties for the analysis of CRKN is that the hypothesis space is dependent on the sample.

Definition 1. Define a Banach space with the norm

The continuity and uniform boundedness of ensure that consists of continuous functions. Denote the ball of radius as .

Our estimate of sample error is conducted through a concentration inequality, which is based on the empirical covering number of . The normalized metric on Euclidean space is defined by Under the above metric, the covering number of a subset of with is denoted by .

Let be a set of functions on . The sampling operator is defined by , for any . The empirical covering number of is defined by

A bound for empirical covering number of was derived in [13]. Suppose that ; then,where is a constant independent of , and is defined byOur main conclusion is about the asymptotic convergence rate of kernel network (8).

Theorem 2. Let be the optimal solution of the algorithm (8) with . Suppose satisfies the capacity condition (16) with , and for some . Taking such that , we have for any , with confidence , that there holdswhere is a constant independent of .

Corollary 3. Under assumptions of Theorem 2. Suppose that and . Taking with , for any , then with confidence , there holds

The condition of is equal to . When the kernel , the parameter can be arbitrarily close to ; thus, our learning rate is almost the optimal rate for CRKN with -norm regularization; see [13].

3. Rough Error Bounds of -Regularization

We propose the following new error decomposition by employing a stepping-stone approach.

Consider the regularized kernel network with norm:Thus, we have thatThe second inequality holds by the definition of and .

Let for . For any , we have

The following concentration inequality is the special case of Lemma   in [13]; here we take .

Lemma 4. Let be a class of measurable functions on . Suppose that there exist constants , such that and for every . Moreover, there are some and , such that Then, for any , with confidence , there holds, for any ,where is a constant only depending on .

For any , the function set is defined by . Now we verify that the conditions in Lemma 4 hold. Firstly, for any , and Secondly, for any , Therefore, we have that, for ,

For any vector , there is

Denote For any , there is .

Applying (22) and Lemma 4 with and , it follows that there is a subset of with measure at most such that

Plugging this estimate into (21) yields thatThe following excess generalization error of was proved in [15].

Proposition 5. Let be given by (20) and with some . Suppose that for some and . Then, for any , take with , where is defined by (17); then, with probability there holds where and are constants independent of and .

Combining this estimate with (31) leads to Proposition 6.

Proposition 6. Let with some . Assume that , for some . Taking with and , then for any there is a subset of with measure at most such that, for any , where and are constants independent on , and .

4. Refined Error Bound by Iteration

In this section, we apply an iteration technique to deduce an error bound. Proposition 6 ensures that, for , We take with , which yields that

It follows that, for any , Let and . Hence,

For , we have that , and ; thus, . A simple computation (see [13]) gives thatwhereand and are both constants independent of , , , and by taking , as and , we can deduceby choosing to be

Proof of Theorem 2. By taking and replacing by in Proposition 6 and the above discussion, with confidence for some , the estimate (38) assures that where is a constant independent of , and , and are defined by (39), (40), and (41). Proposition 6 ensures that, with confidence , Combing the above two inequalities completes the proof.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

Zhongfeng Qu is supported by Shandong Provincial Scientific Research Foundation for Excellent Young Scientists (no. BS2013SF003). Hongwei Sun is supported by the Nature Science Fund of Shandong province China (no. ZR2014AM010).