Probability Error Bounds for Approximation of Functions in Reproducing Kernel Hilbert Spaces

Aydın, Ata Deniz; Gheondea, Aurelian

doi:https://doi.org/10.1155/2021/6617774

Journal of Function Spaces

On this page

Abstract Introduction Notation Examples Conclusions Data Availability Disclosure Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 6617774 | https://doi.org/10.1155/2021/6617774

Probability Error Bounds for Approximation of Functions in Reproducing Kernel Hilbert Spaces

Ata Deniz Aydın¹and Aurelian Gheondea^2,3

Academic Editor: Seppo Hassi

Received03 Dec 2020

Revised09 Mar 2021

Accepted23 Mar 2021

Published03 May 2021

Abstract

We find probability error bounds for approximations of functions in a separable reproducing kernel Hilbert space with reproducing kernel on a base space , firstly in terms of finite linear combinations of functions of type and then in terms of the projection on , for random sequences of points in . Given a probability measure , letting be the measure defined by , , our approach is based on the nonexpansive operator where the integral exists in the Bochner sense. Using this operator, we then define a new reproducing kernel Hilbert space, denoted by , that is the operator range of . Our main result establishes bounds, in terms of the operator , on the probability that the Hilbert space distance between an arbitrary function in and linear combinations of functions of type , for sampled independently from , falls below a given threshold. For sequences of points constituting a so-called uniqueness set, the orthogonal projections to converge in the strong operator topology to the identity operator. We prove that, under the assumption that is dense in , any sequence of points sampled independently from yields a uniqueness set with probability 1. This result improves on previous error bounds in weaker norms, such as uniform or norms, which yield only convergence in probability and not almost certain convergence. Two examples that show the applicability of this result to a uniform distribution on a compact interval and to the Hardy space are presented as well.

1. Introduction

Several machine learning algorithms that use positive semidefinite kernels, such as support vector machines (SVM), have been analysed and justified rigorously using the theory of reproducing kernel Hilbert spaces (RKHS), yielding statements of optimality, convergence, and approximation bounds, e.g., see Cucker and Smale [1]. Reproducing kernel Hilbert spaces are Hilbert spaces of functions associated to a suitable kernel such that convergence with respect to the Hilbert space norm implies pointwise convergence, and in the context of approximation possess various favourable properties resulting from the Hilbert space structure. For example, under certain conditions on the kernel, every function in the Hilbert space is sufficiently differentiable, and differentiation is in fact a nonexpansive linear map with respect to the Hilbert space norm, e.g., see ([2], Subsection 2.1.3).

In order to substantiate the motivation for our investigation, we briefly review previously obtained bounds on the approximation of functions as linear combinations of kernels evaluated at finitely many points. The theory of Vapnik and Chervonenkis of statistical learning theory [3–5] relies on concentration inequalities such as Hoeffding’s inequality to bound the supremum distance between expected and empirical risk. The theory considers a data space on which an unknown probability distribution is defined, a hypothesis set , and a loss function , such that one wishes to find a hypothesis that minimizes the expected risk

Since is not known in general, instead of minimizing the expected risk one usually minimizes the empirical risk over a finite set of samples. Vapnik-Chervonenkis theory measures the probability with which the maximum distance between and falls below a given threshold. Recall that the Vapnik-Chervonenkis (VC) dimension of with respect to is the maximum cardinality of finite subsets that can be shattered by , i.e. for each , there exist and such that

Thus, they prove that, assuming that for each and the VC dimension of is , then, for any ,

Girosi, see [6] and ([7], Proposition 2), has used this general result to bound the uniform distance between integrals and sums of the form , by reinterpreting as , as , and as . Kon and Raphael [7] then applied this methodology to obtain uniform approximation bounds of functions in reproducing kernel Hilbert spaces. They consider two cases where the Hilbert space is dense in with a stronger norm ([7], Theorem 4), and where it is a closed subspace with the same norm ([7], Theorem 5). Also, Kon et al. [8] extended Girosi’s approximation estimates for functions in Sobolev spaces. While these bounds guarantee uniform convergence in probability, the approximating functions are neither orthogonal projections of nor necessarily elements of a reproducing kernel Hilbert space and hence may not capture exactly at nor converge monotonically. Furthermore, the fact that the norm is not a RKHS norm means that derivatives of may not be approximated in general, since differentiation is not bounded with respect to the uniform norm, unlike the RKHS norm associated with a continuously differentiable kernel.

The purpose of this article is thus to establish sufficient conditions for convergence and approximation in the reproducing kernel Hilbert space norm. In Section 3, we find probability error bounds for approximations of functions in a separable reproducing kernel Hilbert space with reproducing kernel on a base space , firstly in terms of finite linear combinations of functions of type and then in terms of the projection onto , for random sequences of points in the base space . Given a probability measure , letting be the measure defined by , , we approach these problems by firstly showing the existence of the nonexpansive operator where the integral exists in the Bochner sense. Using this operator, we then define a new reproducing kernel Hilbert space, denoted by , that is the operator range of . Our main result establishes bounds, in terms of the operator , on the probability that the Hilbert space distance between an arbitrary function in and linear combinations of functions of type , for sampled independently from , falls below a given threshold, see Theorem 8. For sequences of points constituting a so-called uniqueness set, see Subsection 3.4, the orthogonal projections onto the converge in the strong operator topology to the identity operator. As an application of our main result, we show that, under the assumption that is dense in , any sequence of points sampled independently from yields a uniqueness set with probability 1.

The results obtained in this article improve on the results obtained by Kon and Raphael in several senses: the convergence of approximations is in the RKHS norm, which is stronger than the uniform norm whenever the kernel is bounded; the type of convergence with respect to the points is strengthened from convergence in probability to almost certain convergence; and the separability of then allows the result to be extended from the approximation of a single function to the simultaneous approximation of all functions in the Hilbert space. In addition, when compared to the existing methods for this kind of problems, our approach based on the operator defined at (5), that encodes the interplay between the kernel and the probability measure , and the associated RKHS , is completely new and has the potential to overcome many difficulties.

These results are confined to the special case of a separable RKHS of functions on an arbitrary set , due to several reasons, one of them being the fact that the Bochner integral is requiring the assumption of separability, but we do not see this as a loss of generality since most of the spaces of interest for applications are separable. In the last section, we present two examples that point out the applicability, and the limitations of our results as well, the first to the uniform probability distribution on the compact interval , together with a class of bounded continuous kernels, and the second to the Hardy space corresponding to the Szegö kernel which is unbounded. In each case, we can explicitly calculate the space , its reproducing kernel , and the operator .

2. Notation and Preliminary Results

2.1. Reproducing Kernel Hilbert Spaces

In this subsection, we briefly review some concepts and facts on reproducing kernel Hilbert spaces, following classical texts such as Aronszajn [9, 10] and Schwartz [11], or more modern ones such as Saitoh and Sawano ([2], Chapter 2) and Paulsen and Raghupathi [12].

Throughout this article, we denote by one of the commutative fields or . For a nonempty set , let denote the set of -valued functions on , forming an -vector space under pointwise addition and scalar multiplication. For each , the evaluation map at is the linear functional

The evaluation maps equip with the locally convex topology of pointwise convergence, which is the weakest topology on that renders each evaluation map continuous. Under this topology, a generalized sequence in converges if and only if it converges pointwise, i.e., its image under each evaluation map converges. Since each evaluation map is linear and hence the vector space operations are continuous, this renders into a complete Hausdorff locally convex space. With respect to this topology, if is a topological space, a map is continuous if and only if is continuous for all .

We are interested in Hilbert spaces with topologies at least as strong as the topology of pointwise convergence of , so that the convergence of a sequence of functions in implies that the functions also converge pointwise. When is a finite set, , where is the number of elements of , can itself be made into a Hilbert space with a canonical inner product , or in general by an inner product induced by a positive semidefinite matrix. This leads to the concept of reproducing kernel Hilbert spaces.

Recalling the Riesz’s Theorem of representations of bounded linear functionals on Hilbert spaces, if each restricted to is continuous, for each , then there exists a unique vector such that . But, since each vector in is itself a function , these vectors altogether define a map , . Also, recall that a map is usually called a kernel.

Definition 1. Let be a Hilbert space, a kernel. For each define . is said to be a reproducing kernel for , and is then said to be a reproducing kernel Hilbert space (RKHS), if, for each , we have (i)(ii), that is, for every we have The second property is referred to as the reproducing property of the kernel .

We may then summarize the last few paragraphs with the following characterization: Let be a Hilbert space. The following assertions are equivalent: (i)The canonical injection is continuous(ii)For each , the map is continuous(iii) admits a reproducing kernel

In that case, the reproducing kernel admitted by the Hilbert space is unique, by the uniqueness of the Riesz representatives of the evaluation maps. We may further apply the reproducing property to each to obtain that for each , yielding the following properties: (i)For each , (ii)For each , , and(iii)For each , ,

The property in (7) is the analogue of the Schwarz Inequality. As a consequence of it, if for some then for all .

For any , each so we may define the subspace

of . If is the reproducing kernel of a Hilbert space , is also a subspace of and therefore, is a dense subspace of , equivalently, is a total set for .

The property at item (iii) is known as the positive semidefiniteness property. A positive semidefinite kernel is called definite if for all . Positive semidefiniteness is in fact sufficient to characterize all reproducing kernels. By the Moore-Aronszajn Theorem, for any positive semidefinite kernel , there is a unique Hilbert space with reproducing kernel .

Let us briefly recall the construction of the Hilbert space in the proof. We first render into a pre-Hilbert space satisfying the reproducing property. Define on the inner product for any . It is proven that the definition is correct and provides indeed an inner product.

Let be the completion of , then is a Hilbert space with an isometric embedding whose image is dense in . It is proven that this abstract completion can actually be realized in and that it is the RKHS with reproducing kernel that we denote by .

In applications, one of the most useful tools is the interplay between reproducing kernels and orthonormal bases of the underlying RKHSs. Although this fact holds in higher generality, we state it for separable Hilbert spaces since, most of the time, this is the case of interest: letting be a separable RKHS, with reproducing kernel , and let be an orthonormal basis of , then where the series converges absolutely pointwise.

We now recall a useful result on the construction of new RKHSs and positive semidefinite kernels from existing ones. It also shows that the concept of reproducing kernel Hilbert space is actually a special case of the concept of operator range. Let be a Hilbert space, a continuous linear map. Then with the norm is a RKHS, unitarily isomorphic to . The kernel for is then given by the map where such that on . Applying this proposition to particular continuous linear maps, one obtains useful results for pullbacks, restrictions, sums, scaling, and normalizations of kernels.

2.2. Integration of RKHS-Valued Functions

In this article, we use integrals of Hilbert space-valued functions. We first provide fundamental definitions and properties concerning the Bochner integral, an extension of the Lebesgue integral for Banach space-valued functions, following Cohn ([13], Appendix E).

Let be a (real or complex) Banach space and a finite measure space. On , we consider the Borel -algebra denoted by . A map is called measurable if for all , and it is called strongly measurable if it is measurable and its range is separable. If is a separable Banach space then the concepts coincide. Both sets of measurable functions, respectively, strongly measurable functions, are vector spaces. It is proven that a function is strongly measurable if and only if there exists a sequence of simple functions such that pointwise on . In addition, in this case, the sequence can be chosen such that for all .

A function is Bochner integrable if it is strongly measurable and the scalar function is integrable. In this case, the Bochner integral of is defined by approximation with simple functions. Bochner integrable functions share many properties with scalar-valued integrable functions, but not all. For example, the collection of all Bochner integrable functions makes a vector space, and, for any Bochner integrable function , we have

Also, letting denote the collection of all equivalence classes of Bochner integrable functions, identified -almost everywhere, this is a Banach space with norm

In addition, the Dominated Convergence Theorem holds for the Bochner integral as well, e.g., see ([13], Theorem E.6).

In this article, we will use the following result, which is a special case of a theorem of Hille, e.g., see ([14], Theorem III.2.6). In Hille’s Theorem, the linear transformation is supposed to be only closed, and, consequently, additional assumptions are needed, so we provide a proof for the special case of bounded linear operators for the reader’s convenience.

Theorem 2. Let be a Banach space, a measure space, and a Bochner integrable function. If is a continuous linear transformation between Banach spaces, then is Bochner integrable and

Proof. Since is Bochner integrable, there exists a sequence of simple functions that converges pointwise to on and for all and all . Then, hence, the sequence converges pointwise to . Also, it is easy to see that is a simple function for all . These show that is strongly measurable. Since for all and is Bochner integrable, it follows that hence, is Bochner integrable.

On the other hand, hence, by the Dominated Convergence Theorem for the Bochner integral, it follows that ☐

A direct consequence of this fact is a sufficient condition for when a pointwise integral coincides with the Bochner integral, valid not only for RKHSs but also for Banach spaces of functions on which evaluation maps at any point are continuous, e.g., for some compact Hausdorff space .

Proposition 3. Let be a measure space, a Banach space of functions on , such that all evaluation maps on are continuous. Let be such that for each we have .

If, for each, the mapis Bochner integrable, then the scalar mapis integrable, for each fixed.

Moreover, in that case, the pointwise integral maplies inand coincides with the Bochner integral.

Proof. Since, for each , the map is Bochner integrable, and taking into account that, for all , the linear functional is continuous, by Theorem 2, we have Since for all , this means that the scalar map is integrable, for each fixed , and hence, the pointwise integral map lies in and coincides with the Bochner integral .☐

3. Main Results

Throughout this section, we consider a probability measure space and a RKHS in , with norm denoted by , such that its reproducing kernel is measurable. In addition, throughout this section, the reproducing kernel Hilbert space is supposed to be separable.

3.1. The Reproducing Kernel Hilbert Space

On the measurable space , we define the measure by that is, is the absolutely continuous measure with respect to such that the function is the Radon-Nikodym derivative of with respect to .

With respect to the measure space , we consider the Hilbert space . Our approach is based on the following natural bounded linear operator mapping to .

Proposition 4. With notation and assumptions as before, let be a measurable function such that the integral is finite. Then, the Bochner integral exists in .

In addition, the mappingis a nonexpansive, hence, bounded, linear operator.

Proof. By assumptions, the map is measurable, and, since is separable, it follows that this map is actually strongly measurable. Letting denote the norm on and using the assumption that is finite, we have hence, by the Schwarz Inequality and taking into account that is a probability measure, we have

By Theorem 2, this implies that the Bochner integral exists in . Consequently, the mapping as in (26) is correctly defined, and it is clear that it is a linear transformation.

For arbitrary , by the triangle inequality for the Bochner integral (15), we then have and applying the Schwarz Inequality for the integral and taking into account that is a probability measure hence, is a nonexpansive linear operator.

Using the bounded linear operator defined as in (26), let us denote its range by

which is a subspace of the RKHS .

Proposition 5. is a RKHS contained in , hence, in , and its reproducing kernel is where whenever , by convention we define for all .

Proof. Since is a Hilbert space and is a bounded linear map, by (13) it follows that is a RKHS in , isometrically isomorphic to the orthogonal complement of , and its norm is given by

Let and let us define by

From the Schwarz Inequality for the kernel , it follows that if then for all . This shows that for all .

For each , by the Schwarz inequality and the fact that is a probability measure, we have

hence, . Then, taking into account that for all and all , it follows that, for each and , we have

In conclusion, is exactly the representative for the functional so, by (13) the kernel of is and, using the convention that whenever and for arbitrary , ☐

One of the main results of this article, see Theorem 11, assumes that the space is dense in . The next proposition provides sufficient conditions for this.

Proposition 6. Let be a topological space, a Borel probability measure on , a RKHS with measurable kernel , and let , , and defined as in (24), (26), and (31), respectively.

Suppose thatis continuous on, that, and thatis strictly positive on any nonempty open subset of. Then, is dense in.

Proof. The assertion is clearly equivalent with showing that the orthogonal complement of in is the null space. To this end, let , . That is, for each , we have

Then noting the fact that is a Bochner integral, and hence, by Theorem 2, it commutes with inner products,

By assumption, , so we can take to obtain

This implies that -almost everywhere, i.e., the set has zero measure.

Since is continuous by assumption, by the Theorem 2.3 in ([2], Section 2.1.3), each is continuous hence is an open subset of . But, since is assumed strictly positive on any nonempty open set, it follows that must be empty, hence, identically.☐

3.2. Probability Error Bounds of Approximation

The first step in our enterprise is to find error bounds for approximations of functions in the reproducing kernel Hilbert space in terms of distributional finite linear combinations of functions of type . To do that, we use the celebrated Markov-Bienaymé-Chebyshev Inequality on the concentration of probability measures to obtain regions of large measure with small approximation error, in terms of the Hilbert space norm and not simply the uniform norm.

Theorem 7. (Markov-Bienaymé- Chebyshev’s Inequality) Let be a probability space, a Banach space, and let be two Borel measurable functions. Then, for any , we have

The classical Bienaymé-Chebyshev Inequality is obtained from (43) applied for , , and , for , where is the expected value of the random variable and is the variance of .

Theorem 8. With notation and assumptions as before, let and . For each and , consider the set

Then, lettingdenote the product probability measure onand defining the bounded linear operatoras in (26), we have

Proof. By Proposition 4, the Bochner integral exists in and the linear operator is well-defined and bounded. In order to simplify the notation, considering the function defined by observe that is measurable and for each , we have

Then, we have

Since is a probability measure, we have

On the other hand, by Fubini’s theorem and the fact that the Bochner integral commutes with continuous linear operations, see Theorem 2, we have

Also, for each , and, for each ,

Integrating both sides of (49) and using all the previous equalities, we therefore have

Finally, in view of the Markov-Bienaymé-Chebyshev Inequality as in (43), when is replaced by and by , and taking into account the previous equality and (48), we get which is the required inequality.☐

3.3. Convergence in Probability

As with the special case of kernel embeddings, for which , see Smola et al. [15], we may use the bound in Theorem 8 to obtain a statement of convergence in probability.

With notation and assumptions as before, given and fixed , the problem of finding the optimal to minimize is straightforward: is the orthogonal projection of to .

We may assume without loss of generality that are linearly independent, by removing points as necessary without affecting (or losing any information about , since implies by the reproducing property). According to Körezlioglu [16], if is a sampling such that are linearly independent and considering the finite-dimensional subspace of , then the orthogonal projection of onto is given by

for any , where is the inverse of the Gram matrix of .

More generally, if are not linearly independent, for any subset such that form a basis for , we have and

Note that, in general, is not simply a multiple of , hence, setting for any fixed will not yield the best possible approximation. However, with such coefficients dependent only on , it will be easier to bound across different s than . Then, any upper bound on for some fixed will also be an upper bound on .

Theorem 9. (Convergence in Probability of Projections) Let , , , and be as in Theorem 8. For each sequence and each , let denote the orthogonal projection of onto . Let and, for each and , define Then, for each where .

In particular, ifbelongs to, the closure ofwith respect to the topology of, then

Proof. Let and fix , arbitrary. Then hence, with notation as in (45), we have . By Theorem 8, this implies Therefore, Thus, since the left-hand side is independent of , In particular, if belongs to , then .☐

3.4. Uniqueness Sets and Almost Certain Convergence of Projections

With notation and assumptions as before, we now follow ([2], Subsection 2.4.4) in recalling the strong convergence of to the identity map as for appropriately chosen . Since is separable, there exists a countable subset of which is total in ; thus, there exists a countable set such that is dense in . This motivates the following definition: a countable subset of is called a uniqueness set for if is a total set in , that is, if such that for all implies . Then, the so-called Ultimate Realization of RKHSs, compare ([2], Theorem 2.33), reads as follows: if is a uniqueness set such that is linearly independent, is the Gram matrix for , , then for each ,

under the topology of , with distance decreasing monotonically. Consequently, for , and for . This has implications in interpolation theory, e.g., see ([2], Corollary~2.6).

Coming back to our problem, by noting that , unlike , is monotonically nonincreasing with respect to , our next goal is to strengthen Theorem 9 to almost certain convergence after passing to a single measure space. First, recall that, e.g., see ([13], Proposition 10.6.1), the countably infinite product space equipped with the smallest -algebra rendering each projection map measurable admits a unique probability measure such that the projection maps are independent random variables with distribution .

Lemma 10. Let , , , and be as in Theorem 8 and . For each define and Then, and, consequently, if , then

Proof. Observe that for each such that , , for each , and hence for each . Then, hence, for any , since is monotone and for all .☐

The main result of this subsection is the following.

Theorem 11. (Almost Certain Convergence of Projections) Let be as in Theorem 8 and suppose is dense in . Then, for each , hence,

Proof. Let . With the same sets defined in (69),

Observe further that whenever , and for each there exists such that , so that

thus, taking into account that is dense in and using Lemma 10, we get

Since is separable let be a countable dense subset of . Since each is a continuous linear operator with operator norm 1, for all iff for all . Thus, by the countable subadditivity of , ☐

In summary, for a given probability measure under the assumption that it renders the space , the image of , dense in , a sequence of points sampled independently from yields a uniqueness set with probability 1. Proposition shows a sufficient condition, valid for many applications, when this assumption holds.

4. Examples

In this final section, we provide detailed examples of the applicability of the results on approximation error bounds obtained in the previous section.

4.1. Uniform Distribution on a Compact Interval

Let be such that for all and denote . For each define and consider the Hilbert space with the inner product

Then is an orthonormal basis of and, for an arbitrary function , we have the Fourier representation with coefficients subject to the condition where the convergence of the series from (83) is at least guaranteed with respect to the norm . However, for any and , by the Cauchy inequality, we have hence, the convergence in (83) is absolutely and uniformly on , in particular is continuous.

By (12), has the reproducing kernel and the convergence of the series is guaranteed at least pointwise. In addition, for any , we have

and hence the kernel is bounded. In particular, this implies that, actually, the series in (86) converges absolutely and uniformly on , hence, the kernel is continuous on . That is, is given by where is a continuous function with period whose Fourier coefficients are all positive and absolutely summable.

Let be the normalized Lebesgue measure on , equivalently, the uniform probability distribution on , and observe that is an orthonormal basis of the Hilbert space . With notation as in (24), we have hence with norms differing by multiplication with . In particular, is an orthonormal basis of the Hilbert space .

We consider now the nonexpansive operator defined as in (26). Then, for any and , we have where the series commutes with the integral either by the Bounded Convergence Theorem for the Lebesgue integral, or by using the uniform convergence of the series and the Riemann integral. Similarly, the Hilbert space , as in Proposition 5, is a RKHS, with kernel,

Thus, letting , and noting that , we have

In particular, is dense in since both contain as dense subsets, but this follows from the more general statement in Proposition 6 as well.

Let now be arbitrary, hence

Then, and, consequently,

Also, for arbitrary as in (83) and (84), we have

Let be a sequence of points in . By Theorem 8 and taking into account of the inequality (61), for any and , we have

On the other hand, we observe that in the inequality (95) the left hand side does not depend on and hence, for any there exists such that and then, for sufficiently large , we get

In particular, if , that is, the inequality (84) is replaced by the stronger one we can choose , , and we have , hence

For example, this is the case for for some , hence, , , and letting , hence, , , we have and hence,

This shows that, the larger is, the faster will be approximated but, since , s cannot be approximated uniformly, in the sense that there does not exist a single to make each bounded by the same with the same probability .

This analysis can be applied more generally to kernels that admit an expansion analogous to (86) under basis functions which constitute a total orthonormal set in , e.g., as guaranteed by Mercer’s Theorem ([2], Theorem 2.30).

4.2. The Hardy Space

We consider the open unit disc in the complex plane and the Szegö kernel where the series converges absolutely and uniformly on any compact subset of . The RKHS associated to is the Hardy space of all functions that are holomorphic in with power series expansion

such that the coefficients sequence is in . The inner product in is with norm

For each , we have hence, the kernel is unbounded.

We consider the normalized Lebesgue measure on , that is, for , we have hence,

Then, is contractively embedded in .

Further on, in view of Proposition 5 and (101), for any , we have which, by using twice the Bounded Convergence Theorem for the Lebesgue measure, equals

This shows that the RKHS induced by consists of all functions that are holomorphic in with power series representation and such that

In particular, an orthonormal basis of is and hence is dense in the Hardy space .

In order to calculate the operator , let be arbitrary, that is, is a complex-valued measurable function on such that

Then, in view of Proposition 3, we have which, by the Bounded Convergence Theorem, equals where for each integer we denote

Observing that, letting , for all integer and , the set is orthonormal in , it follows that for all integer and, hence, is the weighted sequence of Fourier coefficients of with respect to the system of orthonormal functions in . On the other hand, since is contractively embedded in , this shows that is the restriction to of a Bergman type weighted projection of onto a subspace of the Hardy space , that happens to be exactly .

Finally, let with power series representation as in (102) and let with norm given as in (111). Then, by Theorem 8 and taking into account of the inequality (61), for any and , we have

where denotes an arbitrary sequence of points in and denotes the projection of onto . By exploiting the fact that the left hand side in (115) does not depend on and the density of in , for any there exists such that

and hence, for sufficiently large, we have

Let us consider now the special case when the function , that is, with respect to the representation as in (102), we have the stronger condition

In this case, letting calculations similar to (108) and (112) show that hence , and hence, the first term in the right hand side of (115) vanishes and we get

For example, if for some integer , then

showing that better approximations are obtained for smaller than for bigger .

5. Conclusions

Certain key properties of Hilbert spaces drive the analysis that has been obtained in this article, as well as the properties of reproducing kernel Hilbert spaces that render them attractive for function approximation. The Hilbert space structure provides orthogonal projections as the unique best approximation, which can be computed using the reproducing property as an exact interpolation and are shown to converge monotonically to the function for uniqueness sets. The monotonicity of convergence is then used to derive almost certain convergence directly from convergence in probability, and thus establish sufficient conditions for almost every sequence of samples from a probability distribution to be a uniqueness set. For the approximation bound itself, stated in Theorem 8, the mean squared distance in Chebyshev’s inequality can be calculated explicitly thanks to the norm being induced by an inner product and the existence of the Bochner integral.

We did not include in this article an example with the Gaussian kernel, one of the most useful kernels in applications, although calculations similar to those obtained in Section 4 are available. One of the reasons for this omission is that the Gaussian kernels have additional invariance and differentiability/analyticity properties that can be used in order to provide stronger results by using slightly different techniques that are in progress and will make the contents of a future research.

Data Availability

No data used.

Disclosure

Ata Deniz Aydın current address is ETH Zürich, Department of Mathematics, Rämistraße 101, 8092 Zürich, Switzerland.

Conflicts of Interest

The authors of this article declare no conflict of interests.

References

F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
View at: Google Scholar
S. Saitoh and Y. Sawano, Theory of Reproducing Kernels and Applications, Springer, Singapore, 2016.
V. N. Vapnik, “Statistical Learning Theory,” in Adaptive and Learning Systems for Signal Processing, Communications, and Control. A Wiley-Interscience Publication, John Wiley & Sons, Inc., New York, 1998.
View at: Google Scholar
V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, Berlin, 2nd edition, 2000.
V. N. Vapnik and A. Y. Chervonenkis, “Theory of pattern recognition,” in Statistical Problems of Learning [Russian], Izdat. "Nauka", Moscow, 1974.
View at: Google Scholar
F. Girosi, “Approximation error bounds that use VC bounds,” in International Conference on Artificial Neural Networks, F. Fogelman-Soulied and P. Gallinari, Eds., pp. 295–302, Paris, 1995.
View at: Google Scholar
M. A. Kon and L. A. Raphael, “Approximating functions in reproducing kernel Hilbert spaces via statistical learning theory,” in Wavelets and Splines: Athens 2005, pp. 271–286, Nashboro Press, Brentwood, TN, 2006.
View at: Google Scholar
M. A. Kon, L. A. Raphael, and D. A. Williams, “Extending Girosi's approximation estimates for functions in Sobolev spaces via statistical learning theory,” Journal of Analysis and Applications, vol. 3, pp. 67–90, 2005.
View at: Google Scholar
N. Aronszajn, “La théorie générale des noyaux reproduisants et ses applications, Premiére Partie,” Proceedings of the Cambridge Philosophical Society, vol. 39, pp. 133–153, 1944.
View at: Google Scholar
N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
View at: Google Scholar
L. Schwartz, “Sous espace Hilbertiens déspaces vectoriel topologiques et noyaux associés (noyaux reproduisants),” Journal d'Analyse Mathématique, vol. 13, pp. 115–256, 1964.
View at: Google Scholar
V. I. Paulsen and M. Raghupathi, An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, Cambridge University Press, Cambridge, 2016.
D. L. Cohn, Measure Theory, Birkhäuser, New York, 2013.
J. Diestel and J. J. Uhl Jr., “Vector measures,” in Mathematical Surveys and Monographs, no. 15, American Mathathematical Society, 1977.
View at: Google Scholar
A. Smola, A. Gretton, L. Song, and B. Schölkopf, “A Hilbert space embedding for distributions,” in International Conference on Algorithmic Learning Theory, pp. 13–31, Berlin, Heidelberg, 2007.
View at: Google Scholar
H. Körezlioglu, “Reproducing kernels in separable Hilbert spaces,” Pacific Journal of Mathematics, vol. 25, pp. 305–314, 1968.
View at: Google Scholar

Copyright

Copyright © 2021 Ata Deniz Aydın and Aurelian Gheondea. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

359

Downloads

534

Citations