Research Article | Open Access
A Natural Diffusion Distance and Equivalence of Local Convergence and Local Equicontinuity for a General Symmetric Diffusion Semigroup
In this paper, we consider a general symmetric diffusion semigroup on a topological space with a positive -finite measure, given, for , by an integral kernel operator: . As one of the contributions of our paper, we define a diffusion distance whose specification follows naturally from imposing a reasonable Lipschitz condition on diffused versions of arbitrary bounded functions. We next show that the mild assumption we make, that balls of positive radius have positive measure, is equivalent to a similar, and an even milder looking, geometric demand. In the main part of the paper, we establish that local convergence of to is equivalent to local equicontinuity (in ) of the family . As a corollary of our main result, we show that, for , converges locally to , as converges to . In the Appendix, we show that for very general metrics on , not necessarily arising from diffusion, , as R. Coifman and W. Leeb have assumed a quantitative version of this convergence, uniformly in , in their recent work introducing a family of multiscale diffusion distances and establishing quantitative results about the equivalence of a bounded function being Lipschitz, and the rate of convergence of to , as . We do not make such an assumption in the present work.
Diffusion semigroups play an important role in analysis, both theoretical and applied. Diffusion semigroups include the heat semigroup and, more generally, as discussed in, e.g., , arise from considering large classes of elliptic second-order (partial) differential operators on domains in Euclidean space or on manifolds. For examples of theoretical results involving diffusion semigroups, the interested reader may refer to Sturm  and Wu . Some recent applications of diffusion semigroups to dimensionality reduction, data representation, multiscale analysis of complex structures, and the definition and efficient computation of natural diffusion distances can be found in, e.g., [4–11].
A particular important issue in harmonic analysis is to connect the smoothness of a function with the speed of convergence of its diffused version to itself, in the limit as time goes to zero. For the Euclidean setting, see, for example, [12, 13]. In order to consider the smoothness of diffusing functions in more general settings, a distance defined in terms of the diffusion itself seems particularly appropriate.
Defining diffusion distances is of interest in applications as well. As discussed in , dimensionality reduction of data and the concomitant issue of finding structures in data are highly important objectives in the fields of information theory, statistics, machine learning, sampling theory, etc. It is often useful to organize the given data as nodes in a weighted graph, where the weights reflect local interaction between data points. Random walks, or diffusion, on graphs may then help understand the interactions among the data points at increasing distance scales. To even consider different distance scales, it is necessary to define an appropriate diffusion distance on the constructed data graph.
In this paper, we consider a general symmetric diffusion semigroup on a topological space with a positive -finite measure (i.e., is a countable union of measurable sets with finite measure), given, for , by an integral kernel operator: . As part of their work in [7, 11], Coifman and Leeb introduce a family of multiscale diffusion distances and establish quantitative results about the equivalence of a bounded function being Lipschitz, and the rate of convergence of to , as (we are discussing some of their results using a continuous time for convenience; most of Coifman’s and Leeb’s derivations are done for dyadically discretized times. Moreover, most of the authors’ results are in fact established without the assumption of symmetry and under the weaker condition than positivity of the kernel, namely, an appropriate integrability statement (see )). To prove the implication that Lipschitz implies an appropriate estimate on the rate of convergence, Coifman and Leeb make a quantitative assumption about the decay offor their distances , namely, thatfor some . The authors show that their decay assumption holds for semigroups arising in many different settings (for which suitable decay and continuity assumptions are made on diffusion kernels relative to an intrinsic metric of the underlying space), and even for some examples of nonsymmetric diffusion kernels. Coifman and Leeb also establish that (2) above, in the case of positive diffusion kernels, is in fact equivalent to their conclusion about the rate of convergence of to , as , for a Lipschitz function . Additionally, Coifman and Leeb show that, in some of the settings they consider (with decay and continuity assumptions on the diffusion kernels relative to an intrinsic metric), their multiscale diffusion distance is equivalent to (localized) , where is the intrinsic metric of the underlying space and is a positive number strictly less than 1. The authors emphasize that cannot be taken to equal 1.
In the present paper, we introduce a new family of diffusion distances generated by the diffusion semigroup . We provide several reasons as to why we think our definition is natural; in particular, we show that, for a convolution diffusion kernel on , we achieve in the discussion just above; i.e., we can recover (local) Euclidean distance to the “full” power 1.
The implication established in [7, 11] that smoothness of implies control of the speed of convergence of to seems to us to be a more notable result than the converse (which the authors establish without assuming the decay of (1)). However, if is Lipschitz for the multiscale diffusion distance introduced in [7, 11], as the authors themselves point out their assumed estimate (2) almost tautologically leads to the desired estimate for the speed of convergence of to .
The main reason for our current work is that we wish to avoid making any assumptions about the decay of (1) and still establish a correspondence between some version of smoothness of a function and convergence of to , as . Our main contribution is to establish, under almost no assumptions, that local equicontinuity (in ) is equivalent to local convergence; i.e., local control of the differences for all small is equivalent to local control of the differences for all small . Here “local” is defined relative to a representative of our family of proposed diffusion distances.
Our paper is organized as follows. Following a notation and assumptions section (Section 2), we define our version of a natural diffusion distance in Section 3:for a bounded, nonnegative, increasing function on , with . We are led to our definition by requiring that a diffusion distance has the property that, for all functions bounded in magnitude by 1, be Lipschitz with respect to the distance, independent of the particular (of course, we expect the Lipschitz constant to grow as goes to 0). This requirement arises from the intuitively reasonable demand that diffusion be smoothing in some sense. We then discuss some other reasons why our resulting distance is natural. In particular, for diffusion semigroups with convolution kernels on (this class includes the Poisson and heat kernels), our distance is equivalent to (local) Euclidean or sub-Euclidean distances for certain choices of the function .
In Section 4, we make the assumption that balls of positive radius with respect to the distance have positive measure. We show there is an equivalent topology, which does not depend on the function , for which a corresponding statement about positive measure is equivalent to our assumption. The latter requirement, in turn, seems to be a mild and reasonable one.
In the main section, Section 5, we define our version of local convergence of to , as well as local equicontinuity of the family . Both definitions use our distance . We then establish that local convergence is equivalent to local equicontinuity. We next prove a corollary which extends an a.e. convergence result of Stein in : for , converges locally to , as converges to .
In the Appendix, we show that, for very general metrics on , not necessarily arising from diffusion,This result is clearly a weaker statement than (2), but has the advantage of holding under virtually no assumptions.
2. Notation and Assumptions
Let be a topological space equipped with a positive -finite measure. For , will denote a symmetric kernel on , with for all . We assume that satisfies the semigroup property:for all , and . In addition, we assumefor all and all . We will refer to a kernel satisfying the conditions above as a symmetric diffusion kernel (at time ). A typical example for is the heat kernel on a Riemannian manifold (see , for example).
For a function , say in (or more generally, for any where the following definition makes sense), we define the symmetric diffusion operator , for , by
We define to be the identity map. Note that, for all , , by Fubini’s theorem, that clearly , and hence , for , by interpolation.
To avoid degeneracy, e.g., each being the averaging operator on a space of finite mass, we make an additional assumption: in , as .
The symmetric diffusion operator has the following properties of a symmetric diffusion semigroup:(i) is the identity(ii), for all (iii), for (iv) is a self-adjoint operator on (v) in , as (vi) if (vii)
See Stein’s book , in which the author derives various harmonic analysis results for symmetric diffusion semigroups without explicitly using kernels.
3. A Natural Diffusion Distance
We now define our diffusion distance.
Definition 1. For a bounded, nonnegative, increasing function on , with , and strictly positive on the interval , define the distance by
It is clear that the distance satisfies the triangle inequality. Note that the restriction that is bounded in the above supremum has the effect of making all “large” distances comparable to a constant, but this is not a drawback for smoothness considerations.
We would now like to discuss why we are using this particular diffusion distance and why we think it is a natural choice. Our starting point is the desire that, for a reasonable diffusion distance , should be “smooth” for , even for “rough” functions . This intuitive requirement is suggested by the idea that a diffusion semigroup be smoothing, in some sense. It would further be natural that the smoothness decays, for a general , as . We are thus led to impose a Lipschitz-like requirement, namely, that, for a diffusion distance , and for ,It is easy to see thatNote that, for any and , is decreasing in , since, for ,using (5) and (6). Letting we thus see that is increasing, and from (10) we conclude thatThis last inequality motivates our Definition 1 of . The restriction to is to ensure that is finite for all and and is not stringent, due to the fact that is decreasing in and that for smoothness purposes we need to only concentrate on points and which are near each other.
A further indication of the naturality of our proposed diffusion distance is that the norm of the difference of two probability densities, , occurring in the definition of , is the (scaled) total variation distance between the probability distributions and , i.e.,Here, is the measure given by , and is the measure given by for measurable ; the supremum is taken over all measurable (see Chapter 4 of ).
As a final argument for the naturality of our proposed diffusion distance, we calculate for a special case considered by the authors of  (for their own version of diffusion distances). We take , , and assume that the diffusion kernel has the form . Here, and is a nonnegative radial function whose gradient is also in . The case is for the heat kernel (with the appropriate ), and the case is for the Poisson kernel (with the appropriate ). Now,where we made the change of variables . Let . Then it is easy to see that is radial and, for a “generic” , we have the estimates: if , and if . Here, is the usual Euclidean norm. Using this observation, and (14), we obtain the following (for this special case).
Proposition 2. For , if , and if . For , .
Proof. Using the notation for the special case above, we need to estimate .
Let us first consider the situation when . Then, for , , so using the estimate for mentioned before the proposition.
Next, consider the situation when . Let . Note that .
When , we have that , so If , the maximum of the right hand side occurs at and equals If , the maximum of the right hand side occurs at and equals .
When , we have that , so and the maximum of the right hand side occurs at and equals Note that if , since , .
Combining the above discussions for the two ranges of values of , the result follows.
Thus, for this special case of , , and , which includes both the heat kernel and the Poisson kernel, our definition of diffusion distance gives (local) Euclidean or sub-Euclidean distance (depending on the relative sizes of and ). This result seems appropriate.
4. A Geometric Assumption about the Measure on
We make the following reasonable assumption about our distance : for any and any , the ball of radius and center , has positive measure.
To justify the statement that this assumption is indeed reasonable, we first define another family of subsets of . For any , , and , let
We then have the following equivalence of topologies induced by the sets and :
Proposition 3. For any and any , there exist and such that . Conversely, for any , , and , there exists a such that .
Proof. Fix an and an . We first show that there exist and such that .
Since we made the assumption that for the function used in defining the distance , there exists a with . Let , where . Now, pick an arbitrary .
For , since in increasing, we see that Using the fact that the norm of is 1 for any and .
Now consider the case when . Note that, by definition of , we have that . Then, for this range of , we observe that where we have used that is decreasing in ; see (11).
We conclude (see (8)) that and hence .
For the converse, fix , and . We will show that there exists a such that .
Since, for any , is decreasing in (see (11)), we clearly have that for any . Thus, we may assume . Let . Then, for any , we have that . Hence, using Definition 1 of the distance , we obtain Thus, , and we have that .
Returning to our assumption that, for any and any , has positive measure, Proposition 3 shows that it is equivalent to require the following: for any , , and , the set has positive measure. Note that the definition of the sets is more “universal” than that of the balls , since the former do not involve the function .
The assumption that, for any , , and , the set has positive measure appears to us to be a very natural, and mild, one. In words, this requirement is saying that, for any time and any , the set of points in our space which have not diffused more than away (in the sense) from the diffused point , at time , is not “thin” with respect to the underlying measure on . This assumption seems reasonable in both the discrete case (each point has positive mass, and is “enough”) and the continuous case (every point has “many” arbitrarily close points in the sense of diffusion).
5. Local Convergence Is Equivalent to Local Equicontinuity
In this section, we define local convergence and local equicontinuity for our situation and show that the two concepts are equivalent under our assumptions.
In what follows, is a symmetric diffusion operator as defined in Section 2.
Definition 4. Let , . Note that is actually an equivalence class of functions on the space . Suppose there exists a particular representative of this equivalence class, which we will also call , such that this representative is defined at every point of , and for every , there exist and so that , for all with and all . We then say converges to locally at .
We also make the following.
Definition 5. Let , . Suppose there exists a particular representative of the equivalence class specified by and which we will also call , such that this representative is defined at every point of , and for every , there exist and with the property that, for all , we have and for all with , . We then say the family is locally equicontinuous (in ) at .
Our main result is the following.
Proposition 6. For and any , the following are equivalent: (i) converges to (the representative) locally at (ii)The family is locally equicontinuous at Moreover, if a representative satisfies one of these statements, the same representative satisfies the other statement.
Proof. We first show that local convergence at implies local equicontinuity at . We thus begin by assuming that converges to a representative locally at .
First, we establish continuity of this representative at . Fix . By the assumption, there exist and such that , for all with and all . Then, for any , using the definition of the distance , we see that Since we assumed that if , we have that . Thus, if , and continuity of at is shown.
Next, note that Let and be as above, i.e., , for all with and all . Since we have already shown that is continuous at , there exists a such that for . Let . Then, for and , we see that . Hence, the local equicontinuity of the family at follows.
Conversely, we now show that local equicontinuity at implies local convergence at . We thus begin by assuming that the family is equicontinuous at .
Fix . By the assumption, there exist and such that, for the representative , and , for all and all with . In Section 4, we made the assumption that all balls of positive radius have positive measure. Using Stein’s Maximal Theorem (see Chapter III, §3 in ), a.e. So there is a such that . Now, for ,We estimate the first term on the right hand side of the above inequality as follows: for all , since . For the second term, we use that : there exists a such that , for all satisfying . Finally, for the third term, we see that since .
Thus, for all with , and for any , we obtain that , which concludes the proof of the converse.
In the proof above, we used Stein’s Maximal Theorem (see Chapter III, §3 in ) to state that a.e. Stein’s a.e. convergence result, for say, is the main place in our paper where the symmetry of the operators is needed: Stein requires symmetry to prove his Maximal Theorem.
We immediately have the following.
Corollary 7. Let . Fix . Then for any , converges locally to at .
Proof. By Proposition 6, it suffices to show that is locally equicontinuous at . Fix . Let for and for . For any , we have that using the definition of the distance and the function , that is increasing, and inequality (11). Then, for , we see that implies that , and we have shown local equicontinuity at .
Using our notation, Stein in  mentions that for almost all , since he proves that is a real-analytic function of for almost all . Corollary 7 extends Stein’s result (under our assumption discussed in Section 4) to show local convergence with respect to the distance .
6. Conclusions and Future Work
In this paper, we have defined a diffusion distance which is natural if one imposes a reasonable Lipschitz condition on diffused versions of arbitrary bounded functions. We have next shown that the mild assumption that balls of positive radius have positive measure is equivalent to a similar, and an even milder looking, geometric demand. In the main part of the paper, we establish that local convergence of to (a representative) at a point is equivalent to local equicontinuity of the family at that point.
It may well be useful to have a quantitative estimate on the rate of convergence of to under the assumption that is Lipschitz, say, with respect to some distance (where may be our ). As essentially pointed out in the papers [7, 11], a key issue is whether, and how rapidly,
In the Appendix below, we show that, for very general metrics on , not necessarily arising from diffusion,This result is certainly far from establishing the convergence in (32), much less a quantitative estimate.
We plan to continue exploring for which (diffusion) distances the convergence in (32) holds and an estimate can be obtained.
Proposition 8. Let be a metric on with the following properties: (1)(2) is separable with respect to the metric , i.e., it contains a countable dense subset(3)There exists a so that for every (the bound need not be uniform in ). Here, denotes the measure of the ball Then,
To prove the proposition, we first establish the following.
Lemma 9. For any , if is such that , thenfor almost all .
Proof. Let , where is the characteristic function of the ball . Since , we see that . Using Stein’s Maximal Theorem (see Chapter III, §3 in ), we conclude that