`Abstract and Applied AnalysisVolume 2014 (2014), Article ID 402918, 6 pageshttp://dx.doi.org/10.1155/2014/402918`
Research Article

## -Coverings of Hölder-Zygmund Type Spaces on Data-Defined Manifolds

1Department of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, Austria
2Faculty of Mathematics, Technische Universität München, Boltzmannstraße 3, 85748 Garching, Germany
3Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany

Received 13 February 2014; Accepted 19 May 2014; Published 17 June 2014

Copyright © 2014 Martin Ehler and Frank Filbir. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We first determine the asymptotes of the -covering numbers of Hölder-Zygmund type spaces on data-defined manifolds. Secondly, a fully discrete and finite algorithmic scheme is developed providing explicit -coverings whose cardinality is asymptotically near the -covering number. Given an arbitrary Hölder-Zygmund type function, the nearby center of a ball in the -covering can also be computed in a discrete finite fashion.

#### 1. Introduction

Data processing in the digital era often deals with finitely many high-dimensional data chunks stemming from measurements that obey some continuous physical model. The implementation and numerical evaluation require estimates on the accuracy of the discretization with respect to the underlying model. As an elementary tool providing accuracy guarantees, we will address -coverings of some function spaces related to information theory and machine learning.

As a standard concept in discrete mathematics, the -covering number is the minimal number of balls of radius that cover a compact metric space . An arbitrary element in can be represented by a nearby center preserving precision up to . As such, -coverings are also an integral part of approximation theory, especially if is some function space. Covering numbers capture the complexity of   and the approximation aspects are used in many fields such as information theory, statistics, nonparametric density estimation, and machine learning. There are estimates on the asymtotics of the -covering numbers of the standard function spaces (cf. [1, 2]), but some fields such as machine learning involve data lying on some manifold, so that target functions are naturally defined on this manifold. To clarify the terminology, we consider smoothness spaces on manifolds as somewhat nonstandard function spaces. It may be possible that the covering number of a function space on some compact Riemannian manifold can be assembled by covering numbers of standard function spaces on Euclidian spaces derived from the charts. However, it is also important to derive explicit -coverings whose cardinality is near the benchmark given by the -covering number. We believe that explicit coverings may be harder to construct using the charts due to interface problems, and therefore we will not pursue this direction and, instead, we will follow a more global approach.

In general, there is still demand for computing coverings of many discrete and continuous spaces [3]. As an important additional requirement, any covering of a function space needs to come with an algorithmic scheme to determine some function’s nearby center in an effective manner. At first sight, the latter seems simple enough as we can take the center whose distance is minimal. However, determining the distance between two functions is eventually a continuous operation, and one is particularly interested in finite methods.

In this paper, we first determine the asymtotics of the -covering number for the unit ball of some Hölder-Zygmund type space on an underlying smooth compact Riemannian manifold (without boundary and with nonnegative Ricci curvature). In fact, we determine the asymtotics of the metric entropy , which is the number of bits needed to enumerate the -covering (cf. [1]). Moreover, we compute an explicit -covering, such that where is the cardinality of the constructed covering and means that the left-hand side can be bounded by a generic constant times the right-hand side. Hence, our covering is optimal up to a logarithmic factor by means of the metric entropy. We allow the underlying manifold to be unknown in our scheme and, instead, to be represented through a finite sampling. This sampling must be chosen carefully and is the key to obtaining a finite scheme. The centers of our -covering can then be determined through a finite process, and we can measure any function’s distance to these centers in a finite manner.

For constructions of -coverings on periodic smoothness spaces, for instance, we refer to [4, 5]. The concept of -entropy is also closely related to entropy numbers; see [68].

The outline of this paper is as follows. In Section 2 we introduce the setting, define the Hölder-Zygmund type space , and determine the metric entropy for its unit ball. An explicit covering is computed in Section 3.

#### 2. Covering Numbers for Hölder-Zygmund Type Spaces

We first fix the setting and list some technical assumptions used throughout the paper. Let be an -dimensional compact and connected Riemannian manifold without boundary and with nonnegative Ricci curvature, geodesic distance , and being the normalized Riemannian volume measure on ; are the eigenfunctions of the Laplacian on , and are the corresponding eigenvalues arranged in nonincreasing order, so that . Readers who are not familiar with some terms from differential geometry that are used here may simply think of a “nice” manifold without boundary, such as the sphere, the real projective space, the (real) Grassmann manifold, or more generally compact homogeneous spaces. The above properties ensure certain estimates on the heat kernel on (see [9, 10]), which were used in a series of papers [9, 1113] to develop approximation schemes for smooth functions on the manifold. Here, we will make use of those approximation schemes, but we will keep the technical details at a minimum level.

Let be a positive integer and most of the time we will restrict ourselves to , where is some nonnegative integer. The space of diffusion polynomials up to degree is Later, we will use the fact that the above conditions imply the following estimate on the Christoffel function: (cf. [911]), so that integration and orthonormality yield . Here, the symbol indicates that each side is bounded by a generic positive constant times the other side.

In traditional scenarios, the accuracy of approximation by polynomials is closely related to the smoothness of the function. Therefore, the accuracy of approximation itself is nowadays considered to be a measurement of smoothness. This viewpoint is particularly useful in our setting because defining smoothness in a classical manner would require more technical details. Here, we define the Hölder-Zygmund type space of order by , where its norm is given by with . Hence, is contained in the Hölder-Zygmund type space if and only if it can be approximated by at rate . Since the eigenfunctions are known to be smooth and we consider the -norm, each function in has a continuous representative and point evaluation makes sense. The unit ball in is denoted by . To compute its covering number, we first establish compactness. Since is not finite-dimensional, is not compact in the Hölder-Zygmund type space, but we consider it as a subspace of .

Lemma 1. The set is compact in .

The compactness of this embedding can be derived from (4) by abstract arguments involving Kolmogorov numbers (cf. [6]). Here, we provide a simple elementary proof for the sake of completeness.

Proof. We aim to verify that any sequence must have an accumulation point in this set. Since each space is finite-dimensional, there are , such that . The latter implies that is bounded for all and . Thus, there is such that the subsequence converges towards . For any , we can recursively construct such that and is a subsequence of . For , this construction yields that is a subsequence of , so that we derive Therefore, is a Cauchy sequence and, hence, converges towards some . Standard calculations reveal that is an accumulation point of and is contained in , which concludes the proof.

We can now derive the asymptotes of the -covering number of in .

Theorem 2. If is fixed and , then holds, where the generic constants do not depend on .

Analogous results can be derived for similar concepts such as different types of -widths of functions spaces (cf. [1416]). Theorem 2 and its proof are rather classical and can be derived from [17]. To guide the interested reader, we will provide the outline of the proof that is based on a general Banach space result and is also used in [18, Theorem 4.1]. Let be a Banach space and let be a sequence of linearly independent elements whose linear span is dense in , and define with . Let be a nonincreasing sequence of positive numbers with . The full approximation space is A proof similar to Lemma 1 yields that this space is compact, and we can formulate the result from Banach space theory that goes back to Lorentz in [17].

Theorem 3 (see [19, Theorem 3.3]). Let be a nonincreasing sequence of positive numbers such that , for and some constant . For , let . If denotes the -covering number of in , then one has, for , where .

At this point our preparations are complete.

Proof of Theorem 2. We aim to apply Theorem 3 with the function system and with being the closure of in . There, the index set is supposed to start with , so we set , . To define the sequence , we need some preparations. As pointed out before, integrating (3) over yields . By using , we derive, for , Therefore, there are constants , for , such that the definitions , , and , , lead to which also yields Since , for , we can apply Theorem 3. According to [18, Lemma 4.1], , so that the choice of in (9) implies (7).

Remark 4. The proof of Theorem 2 discovers that (7) also holds under much weaker conditions, and we have only used the fact that there is a sequence of linearly independent functions , so that the polynomial spaces in (2) satisfy .

#### 3. Near Optimal Covering

This section is dedicated to constructing our covering of the unit ball in the Hölder-Zygmund type space, which is based on localized summation kernels as developed in a series of papers [9, 1113]. We first need some preparations. A Borel probability measure on is called a quadrature measure of order if Note that our setting yields that there is a constant such that for all and all (cf. [11, Theorem ]); see also [20] for homogeneous spaces. The existence of quadrature measures with finite support is proved for fairly general smooth Riemannian manifolds in [11], where a construction procedure is outlined. In fact, the support of can be chosen to be contained in any sufficiently dense finite sampling of , so that can be identified with and nonnegative weights satisfying . Examples on the sphere, for instance, are given in [21].

The results in [11] yield that we can even choose a sequence of quadrature measures of order , respectively, such that . For the remaining part of the paper, we will suppose that this estimate holds and we define, for , where is an infinitely often differentiable and nonincreasing function with for and for . Although we will not explicitly use it in the present paper, we want to point out that many advantageous properties of are steered by the so-called localization of the kernel ; that is, for fixed and all with , See [12, 13]. Later, we will apply (cf. [11]). Those estimates are used in [12, 13] to characterize the Hölder-Zygmund type smoothness by means of .

Theorem 5. Assume that is a family of quadrature measures of order , respectively. Then, for all , one has where the generic constants do not depend on or . On the other hand, if, for , there are generic constants not depending on such that holds, then .

Next, by using and applying the quadrature property of , a straightforward calculation yields For some fixed , we define the actual approximation by In other words, we replace in (18) with a number on the grid . We define the following collection: which induces a covering of in .

Theorem 6. For fixed and , one applies the discretization (19). Then, there is a constant such that, for all , holds. Thus, for , the collection induces an -covering of in . Its cardinality satisfies where the generic constant does not depend on .

Proof of Theorem 6. The triangle inequality yields Since Theorem 5 implies , we only need to take care of the term on the farmost right. The quantization (19) immediately yields so that (18) and (16) imply Hence, we have derived the estimate on .
To tackle (21), we apply (23), which yields According to [13, Theorem 5.1], holds. Since is contained in the ball of radius 1, we see that Thus, the number of possible values of for fixed is at most , where is a positive constant. Note that we can assume that because, otherwise, would be zero. Since , we have . Therefore, we have , for some positive constant . By using , we obtain which concludes the proof.

According to Theorems 2 and 6, the -covering number of and the number of -balls induced by satisfy Therefore, our scheme is optimal up to a logarithmic factor by means of the metric entropy.

Our results are also related to the field of manifold learning, in which a function must be reconstructed from finite training data (cf. [2225]). When actually applying our scheme, we first acquire a set of samples sufficiently well covering and we also need the function values , which altogether build the training data. Next, we compute a quadrature measure for some maximal such that ; see [11, 21] for an algorithm. Here, we need that the sample points are well distributed and larger require more samples. An element in that is -close to is simply given by , whose computation only requires knowledge of and on the finite set ; see (14) and (19). In other words, we do not need to know the entire manifold but only the finite sampling of the training data , the sampling of the target function , and, more delicately, the sampling of the eigenfunctions , of the Laplacian. Those eigenfunctions, however, are not explicitly known except for few special cases, such as the sphere, projective space, the Grassmann manifold, and few more. Fortunately, approximation of those eigenfunctions is a common procedure in manifold learning. Computational schemes are based on the graph Laplacian to be built from the training data and, at least under suitable assumptions, converging towards the Laplacian on the manifold when the cardinality of the data increases (cf. [2628] and references therein). Those schemes approximately sample the first few eigenfunctions on the training data. Thus, our proposed approach is indeed fully discrete and computationally feasible even if the eigenfunctions are not explicitly known. In fact, the manifold itself can be unknown. As long as satisfies the theoretical assumptions, it is simply represented by means of a finite sample.

Remark 7. The technical assumptions on the manifold and the function system imply certain estimates on the heat kernel on (see [9, 10]), mainly used to ensure that the localization property (15) holds (cf. [12, 13]). Our assumptions also imply the existence of quadrature measures and that for all and some constant . These items lead to the characterization of the Hölder-Zygmund type space by means of in Theorem 5. Moreover, the family can be chosen with finite support, in fact with (cf. [11]). Theorem 5 and are indeed the two main ingredients of the poof of our results in Theorem 6.

Remark 8. The reader familiar with the approximation scheme developed in [9, 1113] may expect that the presented results can be generalized to a wider class of Besov spaces on metric spaces. This is indeed true but requires more technical details and does not lead to a fully discrete scheme in the end. Here, we intended to emphasize the main ideas by keeping technical details at a minimum level and to focus on the development of a fully discrete covering algorithm. The more general approach will be described elsewhere.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

Martin Ehler has been funded by the Vienna Science and Technology Fund (WWTF) through Project VRG12-009. The research of Frank Filbir was partially funded by the Deutsche Forschungsgemeinschaft Grant FI 883/3-1. Both authors thank H. N. Mhaskar for many fruitful discussions.

#### References

1. A. N. Kolmogorov and V. M. Tihomirov, “$\epsilon$-entropy and $\epsilon$-capacity of sets in function spaces,” Uspekhi Matematicheskikh Nauk, vol. 14, no. 2, pp. 3–86, 1959, English translation in American Mathematical Society, vol. 17, pp. 277–364, 1961.
2. A. G. Vituškin, Theory of the Transmission and Processing of Information, Pergamon, 1961.
3. A. Schürmann and F. Vallentin, “Computational approaches to lattice packing and covering problems,” Discrete & Computational Geometry, vol. 35, no. 1, pp. 73–116, 2006.
4. D. Dung, “Non-linear approximations using sets of finite cardinality or finite pseudo-dimension,” Journal of Complexity, vol. 17, no. 2, pp. 467–492, 2001.
5. V. N. Temlyakov, “Estimates for the asymptotic characteristics of classes of functions with bounded mixed derivative or difference,” Proceedings of the Steklov Institute of Mathematics, vol. 189, pp. 161–197, 1990.
6. B. Carl and I. Stephani, Entropy, Compactness and the Approximation of Operators, Cambridge University Press, Cambridge, UK, 1990.
7. D. E. Edmunds and H. Triebel, Function Spaces, Entropy Numbers, Differential Operators, vol. 120, Cambridge University Press, Cambridge, UK, 1996.
8. D. D. Haroske and H. Triebel, Distributions, Sobolev Spaces, Elliptic Equations, European Mathematical Society, 2008.
9. C. K. Chui and H. N. Mhaskar, “Smooth function extension based on high dimensional unstructured data,” Mathematics of Computation, 2013.
10. E. B. Davies, “${L}^{p}$ spectral theory of higher-order elliptic differential operators,” The Bulletin of the London Mathematical Society, vol. 29, no. 5, pp. 513–546, 1997.
11. F. Filbir and H. N. Mhaskar, “Marcinkiewicz-Zygmund measures on manifolds,” Journal of Complexity, vol. 27, no. 6, pp. 568–596, 2011.
12. M. Maggioni and H. N. Mhaskar, “Diffusion polynomial frames on metric measure spaces,” Applied and Computational Harmonic Analysis, vol. 24, no. 3, pp. 329–353, 2008.
13. H. N. Mhaskar, “Eignets for function approximation on manifolds,” Applied and Computational Harmonic Analysis, vol. 29, no. 1, pp. 63–87, 2010.
14. D. Dũng and T. Ullrich, “$n$-widths and $\epsilon$-dimensions for high-dimensional approximations,” Foundations of Computational Mathematics, vol. 13, no. 6, pp. 965–1003, 2013.
15. E. Novak, “Optimal recovery and $n$-widths for convex classes of functions,” Journal of Approximation Theory, vol. 80, no. 3, pp. 390–408, 1995.
16. A. Pinkus, n-Widths in Approximation Theory, Springer, Berlin, Germany, 1985.
17. G. G. Lorentz, “Metric entropy and approximation,” Bulletin of the American Mathematical Society, vol. 72, pp. 903–937, 1966.
18. H. N. Mhaskar, “On the representation of smooth functions on the sphere using finitely many bits,” Applied and Computational Harmonic Analysis, vol. 18, no. 3, pp. 215–233, 2005.
19. G. G. Lorentz, M. V. Golitschek, and Y. Makovoz, Constructive Approximation, Advanced Problems, Springer, New York, NY, USA, 1996.
20. D. Geller and I. Z. Pesenson, “Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds,” Journal of Geometric Analysis, vol. 21, no. 2, pp. 334–371, 2011.
21. Q. T. Le Gia and H. N. Mhaskar, “Localized linear polynomial operators and quadrature formulas on the sphere,” SIAM Journal on Numerical Analysis, vol. 47, no. 1, pp. 440–466, 2009.
22. M. Belkin and P. Niyogi, “Semi-supervised learning on riemannian manifolds,” Machine Learning, vol. 56, no. 1–3, pp. 209–239, 2004.
23. M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: a geometric framework for learning from labeled and unlabeled examples,” Journal of Machine Learning Research, vol. 7, pp. 2399–2434, 2006.
24. M. Gavish, B. Nadler, and R. R. Coifman, “Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning,” in Proceedings of the 27th International Conference on Machine Learning (ICML '10), pp. 367–374, June 2010.
25. A. D. Szlam, M. Maggioni, and R. R. Coifman, “Regularization on graphs with function-adapted diffusion processes,” Journal of Machine Learning Research, vol. 9, pp. 1711–1739, 2008.
26. M. Belkin and P. Niyogi, “Convergence of Laplacian eigenmaps,” in Proceedings of the NIPS, B. Schölkopf, J. C. Platt, and T. Hoffman, Eds., pp. 129–136, MIT Press, 2006.
27. B. Nadler, S. Lafon, R. R. Coifman, and I. G. Kevrekidis, “Diffusion maps, spectral clustering and Eigen functions of Fokker-Planck operators,” in Advances in Neural Information Processing Systems, Y. Weiss, B. Schölkopf, and J. Platt, Eds., vol. 18, MIT Press, Cambridge, Mass, USA, 2006.
28. U. von Luxburg, M. Belkin, and O. Bousquet, “Consistency of spectral clustering,” The Annals of Statistics, vol. 36, no. 2, pp. 555–586, 2008.