Abstract
Given a Poisson process on a bounded interval, its random geometric graph is the graph whose vertices are the points of the Poisson process, and edges exist between two points if and only if their distance is less than a fixed given threshold. We compute explicitly the distribution of the number of connected components of this graph. The proof relies on inverting some Laplace transforms.
1. Motivation
As technology goes on [1β3], one can expect a wide expansion of the so-called sensor networks. Such networks represent the next evolutionary step in building, utilities, industry, home, agriculture, defense, and many other contexts [4].
These networks are built upon a multitude of small and cheap sensors which are devices with limited transmission capabilities. Each sensor monitors a region around itself by measuring some environmental quantities (e.g., temperature, humidity), detecting intrusion and so forth, and broadcasts its collected information to other sensors or to a central node. The question of whether information can be shared among the whole network is then of crucial importance. Mathematically speaking, sensors can be abstracted as points in , , or a manifold. The region a sensor monitors is represented by a circle centered at the location of the sensor. In what follows, it is assumed that the broadcast radius, that is, the distance at which a sensor can communicate with another sensor, is equal to the monitoring radius. Two questions are then of interest: can any two sensors communicate using others as hopping relays and is the whole region covered by all the sensors? The recent works of Ghrist and his collaborators [5, 6] show how, in any dimension, algebraic topology can be used to answer these questions. Their method consists in building the so-called simplicial complex associated to the configuration of points and the radius of communication. Then, simple algebraic computations yield the Betti numbers: the first Betti number, usually denoted as , is the number of connected components; the second number is the number of coverage holes. Thus, we have a satisfactory deployment whenever and . Trying to pursue their work for random settings, we quickly realized that the dimension of the ambient space played a key role. We then first began by the analysis of dimension 1, which appeared to be the simplest situation. In this case, there is no need of algebraic topology so we will not go further in the description of this line of thought even if it was our first motivation.
In dimension 1, the only question of interest is that of the connexity but it can take different forms. Imagine we are given as a domain in which points are drawn. For a radius , one can wonder whether , or one can investigate whether for all . The second situation is less restrictive since we do not impose that the frontier of the interval is to be covered. Depending on the application, we have in mind, both questions are sensible. A slightly different but somehow close problem is that of the circle: consider now that the points are dispatched along a circle of unit perimeter and ask again whether where is the 2-dimensional ball of center and radius . Several years ago, this problem has been thoroughly analyzed ([7] and references therein) for a fixed number of i.i.d. arcs over the circle. A closed form formula can be given for the probability of coverage as a function of the number and of the common law of the arcs length. Some variations of this problem have been investigated since then, see, for instance, [8]. More recently, in [9], algorithms are devised to determine whether a domain can be protected from intrusion by a βbeltβ of sensors (namely, a ring or the border of a rectangle). There is no performance analysis in this work which is focused on algorithmic solutions for this special problem of coverage. Still motivated by applications to sensor networks, [10] considers the situation where sensors are actually placed in a plan, have a fixed radius of observation, and analyse the connectivity of the trace of the covered region over a line. Some recent results of Kahle [11, 12] are actually hardly linked to our results: the motivation is the same, studying the Betti numbers of some random simplicial complexes, but the results are only asymptotic and valid in dimension greater than 2.
Our main result is the distribution of the number of connected components for a Poisson distribution of sensors in a bounded interval. We could not use the method of [7] since the number of gaps does not determine the connectivity of the domain. For instance, one may have only one gap at the βbeginningβ which means that all the points are pairwise within the threshold distance and, thus, that the network is connected, or one may have only one gap in the βmiddleβ which means that there is a true hole of connectivity.
Actually, our method is very much related to the queueing theory. Indeed, clusters, that is, sequence of neighboring points, are the strict analogous of busy periodsβsee Section 2. As will appear below, our analysis turns down to be that of an M/D/1/1 queue with a preemption: when a customer arrives during a service, it preempts the server, and, since there is no buffer, the customer who was in service is removed from the queuing system. This analogy led us to use standard tools of queueing theory: Laplace transform and renewal processesβsee, for instance, [13, 14]. This works perfectly, and, with a bit of calculus, we can compute all the characteristics we are interested in. It is worthwhile to note that a queueing model (namely the M/G/) also appears in [10].
The paper is organized as follows: Section 2 presents the model and defines the relevant quantities to be calculated. The calculations and analytical results are presented in Section 3. For our situation, we find results analogous to that of [7]. In Section 4, two other scenarios are presented, considering the number of incomplete clusters and clusters placed in a circle. In Section 5, numerical examples are presented and analyzed.
2. Problem Formulation
Let , we assume that we are given a Poisson process, denoted by , of intensity on . Let be the atoms of . We, thus, know that the random variables, are i.i.d. and exponentially distributed. We fix . Two points, located, respectively, at and , are said to be directly connected whenever . For , two points of , say and , are indirectly connected if and are directly connected for any . A set of points directly or indirectly connected is called a cluster, a complete cluster is a cluster which begins and ends within . The connectivity of the whole network is measured by the number of clusters.
The number of points in the interval is denoted by . The random variable given by represents the beginning of the th cluster, denoted by . In the same way, the end of this same cluster, , is defined by So, the th cluster, , has a number of points given by . We define the length of as . The intercluster size, , is the distance between the end of and the beginning of , which means that , and is the distance between the first points of two consecutive clusters , given by .
Remark 2.1. With this set of assumptions and definitions, we can see our problem as an preemptive queue, see Figure 1. In this nonconservative system, the service time is deterministic and given by . When a customer arrives during a service, the served customer is removed from the system and replaced by the arriving customer. Within this framework, a cluster corresponds to what is called a busy period, the intercluster size is an idle time, and is the length of the th cycle.
The number of complete clusters in corresponds to the number of connected components (since, in dimension 1, it coincides with the Euler characteristics of the union of intervals, see [5]) of the network. The distance between the beginning of the first cluster and the beginning of the th one is defined as . We also define . Figure 2 illustrates these definitions.
For the sake of completeness, we recall the essentials of Markovβs process theory needed to go along, for further details we refer, for instance, to [13, 14]. In what follows, for a process , is the filtration generated by the sample-paths of :
Definition 2.2. A process with values in a denumerable space is said to be Markov whenever
for any bounded function from to , any and .
Equivalently, a process is Markov if and only if, given the present (i.e., given ), the past (i.e., the sample-path of before time ) and the future (i.e., the sample-path of after time ) of the process are independent.
Definition 2.3. A random variable with values in is an -stopping time whenever, for any , the event belongs to .
The point is that (2.4) still holds when is replaced by a stopping time : given , the past and the future of are independent. is then said to be strong Markov. This property always holds for the Markov processes with values in a denumerable space but is not necessarily true for the Markov processes with values in an arbitrary space.
From now on, the Markov process under consideration is , the Poisson process of intensity over .
Lemma 2.4. For any , and are stopping times.
Proof. Let us consider the filtration . For , we have
Thus, is a stopping time. For , we have
so is also a stopping time. We proceed along the same line for others and as well for to prove that they are stopping times.
Since is a (strong) Markovβs process, the next corollary is immediate.
Corollary 2.5. The set is a set of independent random variables. Moreover, is distributed as an exponential random variable with mean , and the random variables are i.i.d.
3. Calculations
Theorem 3.1. The Laplace transform of the distribution of is given by
Proof. Since is an exponentially distributed random variable,
Hence, the Laplace transform of the distribution of is given by
Using Corollary 2.5, we have , which concludes the proof.
From this result, we can immediately calculate the Laplace transform of the distribution of . Since , we have , and using Corollary 2.5:
Corollary 3.2. The Laplace transform of the distribution of , for , is given by
Proof. We use Corollary 2.5 and Theorem 3.1 to calculate the Laplace transform of the distribution of since : hence, the result.
Let us define the function as that is, is the probability of having clusters in the interval . Since, for all , , the Laplace transform of with respect to , is well defined.
Theorem 3.3. For any , the Laplace transform of is given by
Proof. We note that; see Figure 3,
Hence,
Let
then we have
for , where we used Corollary 2.5 in the third line. For , the Laplace transform is trivial and given by . Substituting (3.14) in the Laplace transform of both sides of (3.12) yields
The proof is, thus, complete.
Lemma 3.4. Let be an positive integer. For any , when , .
Proof. Since there is almost surely a finite number of points in , for almost all sample-paths, there exists such that for any . Hence, for , . This implies that tends almost surely to as goes to 0. Moreover, it is immediate by the very definition of that . Since, for any , is finite, the proof follows by dominated convergence.
Let , , , be the polylogarithm function with parameter , defined by For a positive integer, consider the function of Its Laplace transform is given by
Corollary 3.5. Let be defined as follows: The Laplace transform of the th moment of is which converges, provided that .
Proof. Applying the Laplace transform of both sides of (3.17), we get concluding the proof.
We define as the Stirling number of second kind [15]; that is, is the number of ways to partition a set of objects into groups. They are intimately related to polylogarithm by the following identity (see [16]) valid for any positive integer ,
Corollary 3.6. The th moment of the number of clusters on the interval is given by
Proof. Using (3.22) in the result of Corollary 3.5, we get where the coefficients are integers given by Using the following identity of the Stirling numbers [17], we find that for a positive integer. So, we can write the Laplace transform of the moments as and apply the inverse of the Laplace transform in both sides of (3.12) to obtain According to Lemma 3.4, when , we obtain Hence, for any , which shows that Thus, we have proved (3.23) for any positive integer .
Theorem 3.7. For any , , , and , we have
Proof. Since and since is finite for any , we have, for any , Rearranging the terms of the right-hand side, and substituting , by the result of (3.23), we obtain Furthermore, it is known (see [17]) that Hence, By inverting the Laplace transforms, we get where is the Dirac measure at point . After some simple algebra, we find the expression of the probability that an interval contains complete clusters: concluding the proof.
Lemma 3.8. For , has the three following properties.(i) is differentiable.(ii).(iii).
Proof. Let be a nonnegative integer. The function is obviously differentiable when . Besides, we have Since the right-hand term function of is zero as well as its derivative for all , the function is also derivable when , which proves (i). Items (ii) and (iii) are direct consequences of the Final Value theorem in the Laplace transform of and its derivative.
The expression of gives us a Laplace pair between the and domains: We can use this relation to find the distributions of and .
Theorem 3.9. The probability density functions of and , denoted, respectively, by and , are given: where the expressions of and are straightforwardly obtained from (3.32).
Proof. According to Theorem 3.1,
Here, using the inverse Laplace transform established in (3.40) and remembering that , we get an analytical expression for , proving (3.41).
Proceeding in a similar fashion, we can find the distribution of by inverting its Laplace transform given by Corollary 3.2 as follows:
We, thus, have (3.42).
We can also obtain the probability that the segment is completely covered by the sensors. To do this, we remember that the first point (if there is one) is capable to cover the interval .
Theorem 3.10. Let be defined as follows: Then,
Proof. The condition of total coverage is the same as which means that Hence, and since and are independent The result then follows from Lemma 3.8 and some tedious but straightforward algebra.
4. Other Scenarios
The method can be used to calculate for other definitions of the number of clusters. We consider two other definitions: the number of incomplete clusters and the number of clusters in a circle.
4.1. Number of Incomplete Clusters
The major difference with Section 3 is that a cluster is now taken into account as soon as one of the point of the cluster is inside the interval . So, for instance, in Figure 3, we count actually incomplete clusters. We define as the number of incomplete clusters on an interval .
Theorem 4.1. Let be defined as for and . Then,
Proof. The condition of is now given by We define as Repeating the same calculations, we find the Laplace transform of : With this expression, following the lines of Lemma 3.4, we obtain Then, we write to find an expression with a well-known Laplace transform inverse, and, after inverting it, we obtain Expanding the Laplace transform of the distribution of in a Taylor series and rearranging terms, we get Now, we use another recurrence that the Stirling numbers obey [17], to get Hence, Inverting this expression for any nonnegative integer , we have the searched distribution.
4.2. Number of Clusters in a Circle
We investigate now the case where the points of the process are deployed over a circumference, and we want to count the number of complete clusters, which corresponds to calculate the Euler's Characteristic of the total coverage, so we call this quantity . Without loss of generality, we can choose an arbitrary point to be the origin.
Theorem 4.2. The distribution of the Euler's Characteristic, , when the points are deployed over a circumference of length is given by
Proof. If there is no points on the circle, . Otherwise, if there is at least one point, we choose the origin at this point, and we have equivalence between the events:
In Figure 4, we present an example of this equivalence.
We can define as
to find the Laplace transform or :
The number of clusters is almost surely equal to the number of points when , so
Expanding the Laplace transform in a Taylor series and rearranging terms, as we did previously, yields
Since
we can directly invert this Laplace transform, add the case where there are no points for , and the theorem is proved.
5. Examples
We consider some examples to illustrate the results of the paper. Here, the behavior of the mean and the variance of as well as are presented.
From (3.23), we have that is given by This expression agrees with the intuition in that there are three typical regions given a fixed . When is much smaller than , the number of clusters is approximatively the number of sensors, since the connections with few sensors will unlikely happen, which can be seen from the fact that when . As we increase , the mean number of direct connections overcomes the mean number of sensors, and, at some value of , we expect that decreases, when adding a point is likely to connect disconnected clusters. We remark that the maximum occurs exactly for , that is, when the mean distance between two sensors equals the threshold distance for them to be connected. At this maximum, takes the value of . Finally, when is too large, all sensors tend to be connected, and there is only one cluster which even goes beyond , so there are no complete clusters into the interval . This is trivial when we make in the last equation. Figure 5 shows this behavior when and .
The variance can be obtained also by (3.23) as follows: and, under the condition that
Figure 6 shows a plot of Var in function of for and . We can expect that, when is small compared to , the plot should be approximatively linear, since there would not be too much connections in the network, and the variance of the number of clusters should be close to the variance of the number of sensors given by . Since tends almost surely to 0 when goes to infinity, Var should also tend to 0 in this case. Those two properties are observed in the plot. Besides, we find the critical points of this function, and again, is one of them, and at this value . The other two are the ones satisfying the transcendent equation: By using the second derivative, we realize that is actually a minimum. Besides, if , there is just one critical point, a maximum, at .
The last example in the section is performed with the result obtained in Theorem 3.7. We consider again and to obtain the following distributions: Those expressions are simple, and they have at most four terms, since . We plot these functions in Figure 7. The critical points on those plots at are confirmed for the fact that, in function of for every , can be represented as a sum: where the coefficients are constant in relation to . However, has a critical point at for all , so this should be also a critical point of . If is small, we should expect that is close to one, since it is likely to have no points. For this reason, in this region, for is small. When is large, we expect to have very large clusters, likely to be larger than , so it is unlikely to have a complete cluster in the interval, and, again, approaches to the unity, while for become again small.
Acknowledgment
The authors would like to thank the anonymous referee whose constructive remarks helped us to improve the presentation of this paper.