Research Article | Open Access
Aline Kurtzmann, "Convergence in Distribution of Some Self-Interacting Diffusions", Journal of Probability and Statistics, vol. 2014, Article ID 364321, 13 pages, 2014. https://doi.org/10.1155/2014/364321
Convergence in Distribution of Some Self-Interacting Diffusions
The present paper is concerned with some self-interacting diffusions living on . These diffusions are solutions to stochastic differential equations: , where is the empirical mean of the process , is an asymptotically strictly convex potential, and is a given positive function. We study the asymptotic behaviour of for three different families of functions . If with small enough, then the process converges in distribution towards the global minima of , whereas if or if , then converges in distribution if and only if.
The aim of this paper is to obtain necessary and sufficient conditions for the convergence in distribution of a self-interacting diffusion living on . Consider a smooth potential and a map . We study the asymptotic behaviour of the self-interacting diffusion given by where is a standard Brownian motion and denotes the empirical mean of the process : This is a model of reinforcement that could be used to represent the (simplified) behaviour of some social insects. Some insects, as ants, mark their paths with pheromones. This serves as a guide for other ants to return to the nest. The trail of pheromones is denoted by and its evaporation by . Despite this evaporation, the path is reinforced and the insects gradually manage to find the best route.
The same model has been already studied by Chambeu and Kurtzmann , in case of an unbounded increasing function . The authors have proven that, under certain conditions, the process satisfies a kind of pointwise ergodic theorem and that if admits a unique minimum at 0, then converges almost surely. In this paper, we do not suppose that either increases to the infinity nor that admits a unique minimum at 0. This will obviously change the asymptotic behaviour of , even if will converge in distribution in most of the cases. We will essentially use two different techniques here. The first one is the well-known theory of simulated annealing, which has been developed a lot since the 80s with a huge literature, whereas the second one is simply a change of scale added to a change of “speed measure.”
Let us explain briefly the simulated annealing method. An important question for physical systems is to find the globally minimum energy states of the system. Experimentally, the ground states are reached by chemical annealing. One first melts a substance and then cools it slowly, being careful to pass slowly through the freezing temperature. If the temperature decreases too rapidly, then the system does not end up in a ground state, but in a local nonglobal minimum. On the other hand, if the temperature decreases too slowly, then the system approaches the ground states very slowly. The competition between these two effects determines the optimal speed of cooling, that is, the annealing schedule. The study of simulated annealing has involved the theory of nonhomogeneous Markov chains and diffusion processes, large deviation theory, spectral analysis of operators, and singular perturbation theory. Pioneering work was done by Freidlin and Wentzell . The initial problem consists in finding the global minima of a given function . Actually, one has to study the diffusion Markov process in given by the Langevin-type Markov diffusion . If the temperature is constant for a sufficiently large amount of time, then the process and the fixed temperature process behave approximatively the same at the end of that time interval. The optimal annealing schedule, that is, for the convergence criterion , where denotes the set of all the global minima of , was first determined by Hajek  for a finite state space. Chiang et al.  studied the convergence rate of via the large deviations of the transition density of . They were one of the first to show the convergence of the algorithm of the simulated annealing for , for large enough, related to the second eigenvalue of the corresponding (to ) infinitesimal generator. Finally, Holley and Stroock  initiated a new method and proved, in the discrete case, the convergence of the simulated annealing algorithm via the Sobolev inequality. They went further in their study with Holley et al. . Later, Miclo  proved, through some functional inequalities, that the free energy (i.e., the relative entropy of the distribution of the process at time with respect to the invariant probability measure for the elliptic operator considered as a time-homogeneous operator by fixing ) satisfies a differential inequality, which implies (under some decreasing evolution of the temperature to zero) the convergence of the process to the global minima of the potential. And if the temperature decreases too fast to zero, then the potential can freeze in a local minimum (depending on the initial condition) and so the process converges to this local minimum.
We begin to study the -valued Markov process , which satisfies the following SDE: We will adapt the simulated annealing method to for functions large enough (i.e., does not go to zero) to prove the convergence in distribution of .
We wish to point out that a one-dimensional Brownian motion in a time-dependent potential has been recently studied by Gradinaru and Offret : with and . This is quite close in spirit to the study of our process , even if the authors suppose in  that both and are polynomial. They obtain conditions for the recurrence, transience, and convergence of the studied process . We refer to the survey of Ivanov et al.  for the existence and uniqueness of solutions to such equations. In the present paper, we do not suppose that is polynomial and the dimension is and thus we obtain less precise results.
The remainder of the paper is organized as follows. First, in Section 2, we introduce some useful tools, such as the logarithmic Sobolev inequality and the Kullback information. They both will be needed for the simulated annealing study. Section 3 is devoted to the simulated annealing method in the case when behaves asymptotically as . In this part, we will prove the (pointwise) ergodicity of the process and the convergence in distribution of , depending on the potential . Finally, Section 4 deals with the convergence in distribution of when and , depending on the asymptotics of .
2. Some Useful Tools
2.1. Assumptions and Existence
In the following, denotes the Euclidean scalar product. We denote by the set of probability measures on . We denote by the function . We assume that the mapping is . The precise hypothesis on will be given at the beginning of each section.
In the sequel, the technical assumptions on the potential are the following:(1)(regularity and positivity) and ;(2)(convexity) , where is (>0)-strictly uniformly convex and is a compactly supported function and there exists such that is -Lipschitz;(3)(growth) there exists such that for all , we have We also assume that has a finite number of critical points. Let be the set of the saddle points and local maxima of and let be the set of the local minima of , such that the Hessian matrix is nondegenerate for all local minimum. Without any loss of generality, we suppose that .
Remark 1. The case of quadratic growth is excluded here, as it has been fully studied in .
Let us first prove the global strong existence and uniqueness of the process .
Proposition 2. For any , there exists a unique global strong solution of (1).
Proof. The local existence and uniqueness of such a process is standard. We only need to prove here that , hence, (because ), does not explode in a finite time. To this aim, we apply Itô's formula to the function : and introduce the sequence of stopping times and By the convexity condition, we have , and by the condition (5), there exists such that .
2.2.1. Logarithmic Sobolev Inequality
Definition 3. The probability measure satisfies the logarithmic Sobolev inequality, with the constant denoted by , if for all function , we have
Let denote the density of the semigroup corresponding to the nonhomogeneous Markov process defined by We will specify later the precise form of and . We associate to this process the probability measure , where is the normalization constant of .
Lemma 4. The family of probability measures satisfies a logarithmic Sobolev inequality .
Proof. We use the celebrated Bakry-Emery -criterion, see . We recall that, to the operator , we associate the operator “carré du champ”; that is, (for all function )
Then, we define the operator as
The -criterion asserts that if there exists a positive constant such that , then satisfies a logarithmic Sobolev inequality, with the constant .
An easy calculation, for any function of class , leads to
As (and also ) is strictly convex off a compact set, we have the decomposition as in the convexity hypothesis. We apply the -criterion of Bakry-Emery to the function and we get that . Thus, the probability measure satisfies the inequality . We conclude, by the perturbation lemma due to Holley and Stroock , that the measure satisfies a Sobolev logarithmic inequality with a constant less than or equal to , where .
2.2.2. Kullback Information
Definition 5. We define the free energy (up to an additive constant), known as the relative Kullback information, of a probability measure with respect to a probability measure by If we suppose that (resp., ) has the density (resp., ) with respect to the Lebesgue measure , then one has
In this paper, we will first prove the decrease to zero of the relative free energy of the law of with respect to . The classical Csiszár-Kullback-Pinsker inequality relates the total variation norm to the free energy in the following way (see for instance ): So as the total variation norm metrizes the convergence in distribution, once we have proven that the measure converges weakly to a measure and goes to zero, then the distribution of converges to . As is the time-shifted process , we obtain this way that converges in distribution to .
Our strategy to show that goes to zero is the following. To shorten notation, let be the distribution law of the process conditioned on . We recall that the family of probability measures satisfies a Sobolev logarithmic inequality . We have also . So, we choose satisfying and we will show in Corollary 13 the existence of such that
2.2.3. Asymptotic Pseudotrajectories
In Section 4, we will use the notion of asymptotic pseudotrajectory, introduced by Benaïm and Hirsch . It is particularly useful to analyze the long-term behaviour of stochastic processes, considered as approximations of solutions of ordinary differential equation (the ODE method).
Definition 6. The process is an asymptotic pseudotrajectory for the flow if for all ,
It is shown in  that if is an asymptotic pseudotrajectory for , then the -limit set of the flow generated by is the same as the -limit set of the process .
3. The Simulated Annealing Method
Assume that the mapping is asymptotically equivalent (up to a multiplicative positive constant) to and satisfies such that and for all , , where is the generalized inverse of .
Instead of considering , we consider the time-changed process . This last process satisfies the following SDE: where is a Brownian motion such that has the same law as .
3.1. Convergence in Distribution towards the Global Minima of
We define and . The process satisfies where we have defined . Actually, we will prove that this nonhomogeneous Markov process converges in distribution to a measure that could correspond to its “invariant” probability measure. Of course, if we suppose that and , then the convergence in distribution is obvious. It happens that the spectral gap appears naturally in our study. Heuristically, when the time is of order , the process is very close to the probability measure It remains to show the convergence of when goes to the infinity.
Let be the operator defined by . As goes to the infinity as , the theory of Schrödinger operator (see, e.g., [13, Theorem 13.6]) implies that is self-adjoint in and the spectrum of is discrete: . The subspace corresponding to the first eigenvalue is composed of the constant functions and so Our first aim is to compute the eigenvalue and study its behaviour when .
Lemma 7. Let be fixed. The probability measure converges weakly, as , to . Moreover, exists and is denoted by .
Proof. We only need to recall that diverges with . More explicitly, the normalization constant is Let be the compact set . There exists a constant such that is included in the ball centered in 0 and with radius . Then, on one hand, we get On the other hand we obtain But we know by the Laplace formula (see ) that where are the global minima of (we recall that they form a finite set). As a consequence, By the same method, if is a continuous function with compact support containing, for example, only the global minimum , we have This gives the explicit form of .
Consider for a moment . We remark that converges to when goes to infinity. Hwang established in  that converges weakly when converges to zero. Let be the set of the global minima of . Hwang has proved the following:(i)if (where is the Lebesgue measure on ), then converges weakly to ;(ii)if then converges weakly to (iii)more generally, suppose that is the finite union of some smooth manifolds () and each component is a compact connected smooth manifold and the determinant of the Hessian (normal to in ) is not identically zero. Then, there exists a probability measure , on the highest dimensional manifolds, such that converges weakly to .
We adapt to our setting the results of Hwang in the following proposition.
Proposition 8. The probability measure converges weakly to as goes to infinity. Moreover, the probability measure concentrates on the global minima of .
Proof. The result of Hwang shows that the probability measure converges weakly to as goes to the infinity, and the probability measure concentrates on the global minima of . We combine this result with Lemma 7 to prove the proposition.
In order to show that converges in distribution to a measure supported on the global minima of , we need two more technical results. We mix the approaches initiated by Holley et al.  and Miclo . Indeed, we will use some functional inequalities and show that the free energy (corresponding to our process) decreases. We suppose in the following that for some sufficiently large (and the same proof actually reads when is asymptotically equivalent to ).
Definition 9. The maximal height of the function is the nonnegative function defined by where
Remark 10. The function corresponds to the maximum of all the minimal energies needed to go from each point of to .
The function is positive if and only if there exist more than one local minimum of .
Lemma 11. We have that , where is the maximal height function corresponding to .
Proof. Let . For any path , we easily have . Then, by definition of , we get As a consequence, there exists such that and the result follows.
A very important theorem permits one to relate the height function to the second eigenvalue of the infinitesimal generator of (i.e., the constant involved in the spectral gap inequality).
Theorem 12 (Jacquot [15, Theorem 1.1]). The invariant measure admits a spectral gap , meaning that there exist such that for all , one has for all continuous where . Moreover, .
Corollary 13. The family of probability measures satisfies a logarithmic Sobolev inequality , with .
Proof. Hölder’s inequality implies that the logarithmic Sobolev constant is smaller than the inverse of the spectral gap constant in Theorem 12.
We will now use some functional inequalities in order to prove the convergence of (and thus ) towards the global minima of . Let denote the density of the semigroup corresponding to the nonhomogeneous Markov process .
Theorem 14. Suppose that , where . Then, for all initial , the free energy converges to 0 as goes to the infinity.
To prove Theorem 14, we need the three following technical results. We will first state them all, postponing there proofs, and deduce from them the latter Theorem 14. Let us state the first technical result.
Proposition 15. For all initial , , we get where we have denoted .
Lemma 16 (Miclo, [7, Lemma 6]). Let be a continuous function such that almost surely where and are two continuous nonnegative functions such that and . Then .
We now need a technical lemma to conclude that the free energy converges to 0.
Lemma 17. For all , the quantity is bounded.
We are now ready to prove Theorem 14.
Proof of Theorem 14. Let and . Consider the process , solution to the SDE We can rewrite the result of Proposition 15 in the following way, where we remind that denotes the distribution law of the process conditioned on We remind the reader that out of a compact set and it is proved in  that . We therefore have . Moreover, the function is nondecreasing, while is nonincreasing. Thus, as , the two terms and do not play any role in the upper bound. It now remains to find a upper bound for . To this aim, we use Lemma 17. Indeed, there exist two positive constants such that We now use Lemma 16. We easily compute the time-derivative of : Using the explicit expression of ; that is , we have As is a nondecreasing function and because of the hypothesis on , the first term converges to 0 when goes to the infinity. For the second term, we recall that is bounded and so because . Lemma 16 asserts that if satisfies and , then . For with the given condition on the constant , we meet the required conditions and the result follows.
Proof of Proposition 15. To shorten notation, let be the distribution law of the process , knowing that . We recall that the family of probability measures satisfies a logarithmic Sobolev inequality . We also have . Define , such that : By Corollary 13, there exists a constant such that We now have to compute the derivative of : We put this last estimate in the preceding inequality (43) and thus We recall that we are looking for an inequality including the time-derivative of the free energy . We have Our strategy is to find an upper bound for the two terms on the right-hand side. The Kolmogorov forward equation reads We also remark that we have the following estimates: where we have used the usual notation . Moreover, we also find Now put the first estimate (48), as well as the Kolmogorov equation (47), in the formula (46). We integrate by parts and use the logarithmic Sobolev inequality (43) to get On the other hand, we obtain the following equality for the second integral involved in the time-derivative of : We put all the pieces together and this leads to the result.
Proof of Lemma 17. Let be the compact set , where is a given positive constant. As converges weakly to , we only need to prove that is bounded. We have By Proposition 8, we know that , and so there exists a positive constant such that
We will now describe the law of the limit process .
Proposition 18. The speed of convergence of toward 0 is .
Proof. By Lemma 16, the speed of convergence is given by , with and . Integrating by part, we find that is equivalent, when goes to the infinity and to and thus the speed of convergence is of order . Finally, the speed of convergence of the relative Kullback information to zero is .
Remark 19. It is known since the work of Freidlin and Wentzell  that the Gibbs measure satisfies a large deviation principle. Therefore, the speed of convergence of toward is exponential ().
Corollary 20. Suppose that , where . Then the process converges in distribution to a random variable which concentrates on the global minima of . Thus, the process converges in distribution to a random variable , which concentrates on the global minima of .
Proof. The Kullback information estimates the distance between and , as it is recalled in (15). The result follows as converges weakly to .
Remark 21. The function is supposed to decrease slowly to zero. This is why we obtain the convergence of to the global minima of . But if goes too fast to zero, that is, with , then may freeze in a local minimum. So, does not converge in that case.
3.2. Study of
We give necessary and sufficient conditions for the convergence in distribution of . As usual, we start to work with the process . In order to link this section with the preceding one, we recall that . It implies that we consider functions such that (asymptotically) .
Let us first recall a former result.
Theorem 22 (Chambeu and Kurtzmann [1, Theorem 5.5]). The process satisfies the pointwise ergodic theorem. This means that almost surely the empirical measure of converges weakly to a random measure, which is a convex combination of Dirac measures taken in the minimal points of . More precisely, there exist such that
We are now able to conclude the study of the asymptotic behaviour of the process .
Theorem 23. Suppose that . Then one of the following holds.(1)If is a function such that , then converges in distribution to ;(2)else, diverges.
Proof. Suppose that is such that the integral converges almost surely. The celebrated Slutsky theorem asserts that for two sequences , of valued random variables, if and , then . To prove the result, we let , , and remark that
Suppose that is such that . By Theorem 22, we have that , and we now need to find the rate of convergence in order to conclude the proof. Moreover, by [1, Proposition 5.3], we know that the speed of convergence of the empirical mean of the time-changed process is . But we are looking for the speed of convergence for itself. By an integration by part, we obtain that
Corollary 20 implies that the first right-hand term converges in distribution to 0 because . So it converges in probability to 0. It remains to prove the convergence of the second term. We recall that, up to a multiplicative positive constant, . Moreover, we also know that is almost surely bounded. So, the second right-hand term is upper bounded (up to a multiplicative positive constant) by
and the result is as follows: converges in distribution. And by Corollary 20, converges in distribution to which concentrates on the global minima of . So, converges in probability to 0.
To conclude, if satisfies , then does not converge and so diverges.
4. Convergence in Distribution of to a Random Variable
In this Section, we will prove that if converges to 1 or 0 slowly enough, then the process converges in distribution to an identified limit. We will first study the case and prove rigorously the convergence of . Then, we will consider the case