Abstract
Distributed clustering is widely used in ad hoc deployed wireless networks. Distributed clustering algorithms like DMAC, HEED, MEDIC, ANTCLUST-based, and EDCR produce well-distributed Cluster Heads (CHs) using dependent thinning techniques where a node’s decision to be a CH depends on the decision of its neighbors. An analytical technique to determine the cluster density of this class of algorithms is proposed. This information is required to set the algorithm parameters before a wireless network is deployed. Simulation results are presented in order to verify the analytical findings.
1. Introduction
Distributed clustering is a robust technique used to organize ad hoc deployed wireless nodes to form a communication network [1]. It is widely adopted in energy constrained ad hoc deployed wireless sensor networks (WSNs) [2]. Distributed clustering algorithms, used in ad hoc deployed wireless networks, can be broadly categorized into two classes. The first category consists of independent randomized cluster head (CH) selection class of algorithms; that is, the decision for a node to be a CH is made independent of the decision of its neighboring nodes. For example algorithms such as LEACH [3], LEACH-D [4], SEP [5], and EDAC [6] fall into this category. These algorithms do not produce well-distributed CHs [7]. They may produce two or more adjoining nodes as CHs. Furthermore, the variation between the theoretical expected number of CHs for these algorithms is considerable when compared to the actual number of CHs obtained after deployment [8]. The second category consists of distributed clustering algorithms like DMAC [1], HEED [9], ANTCLUST based [10], MEDIC [11], EDCR [12], and its derivatives [13]. The location of a CH for these algorithms is dependent on its neighbors decision as well. This ensures that no two CHs appear in each others neighborhood and all nodes have at least one CH in their neighborhood or the node itself is a CH. They produce well-distributed clusters using dependent decision making and is referred too as Dependent Thinning Distributed Clustering (DTDC) class of algorithms. We note that the CH selection process of DTDC class of algorithms resembles the reverse price auction and is sometimes known as the Dutch auction [11] method.
Irrespective of the distributed clustering algorithm used in ad hoc deployed wireless network applications, the knowledge of the expected number of clusters, denoted by , is an important parameter required at the planning stage of the network. For example, consider a WSN where data is collected periodically and aggregated at the CH, then communicated to the base station (BS). The application may expect number of clusters, where each cluster has an expected number of nodes, denoted by , in the given deployment area . The given requirement is generated based on the level of reliability expected from the collected data. That is, the reliability is directly connected to the redundancies associated with the nodes within a cluster [14]. Another example is if an ad hoc deployed wireless network application is required to produce an optimal number of clusters based on the requirement to minimize the energy cost for communication and maximize the network lifetime [15, 16]. In both these examples, the WSN parameters should be set appropriately at the initial deployment stage so that when in operation the desired number of clusters is achieved to meet the design objective. In the first example, this objective is increased reliability, where as in the second example it is to maximize the lifetime.
To identify the importance of knowing WSN design parameters, let us look at a known example like the LEACH algorithm from the first category. LEACH uses a parameter which represents the expected proportion of nodes to be CHs. That is, a node has probability of becoming a CH independent of the decision of its neighbors. When an independent randomized clustering algorithm like LEACH is applied to an ad hoc deployed network where nodes are uniform randomly deployed in a given area, the expected number of clusters can be found using the expression [3]. According to [17], the node distribution of such a system is considered to be a 2-D Poisson point process with intensity , and the resultant CHs too would be distributed as a 2-D Poisson point process with intensity (i.e., CH density) . We see that by setting the WSN parameters and we can achieve a desired . That is the analytical expressions presented play an important part in achieving the proper .
To the authors knowledge, no such analysis exists to determine the CH distribution and density () of DTDC class of algorithms. However, Bettstetter [18] has presented an empirical formula for the CH density of the DMAC algorithm using simulation results. As it is an empirical formula, it cannot be generalized. In this paper we present an analytical expression for CH density for the DTDC class of algorithms in order to address this gap.
In what follows, we will first establish that the DTDC class of algorithms such as HEED, ANTCLUST based, DMAC, MEDIC, and EDCR will indeed fall into one common category in terms of their CH distribution. Then, we will determine the probability distribution of the cluster area of the DTDC class of algorithms. Subsequently, the distribution of the cluster area will be used to derive the cluster density. Furthermore, we will also consider the boundary (or boarder) effect due to the finite geographical area in which the nodes are distributed and modify the expressions to accommodate it. The proposed analytical results will prove that the empirical results derived using simulations by Bettstetter in [18] are indeed accurate.
Rest of the paper is organized as follows: Section 2 presents the nomenclature. Section 3 provides a mathematical model to express the CH selection and distribution common to all DTDC algorithms. In in Section 4 the model presented in the previous section will be used to identify a probability distribution of cluster area of DTDC class of algorithms. Subsequently in Section 5 these results will be used to find the cluster density and the number of expected clusters in a rectangular and circular deployment area. Simulation results presented in Section 6 establish that the analytical findings are in line with the actual values presented in existing literature. Section 7 presents the conclusion.
2. Nomenclature
Table 1 gives the notations used in what follows. Some are extracted from [12].
3. Preliminaries
This section presents the background necessary to find the CH density of DTDC class algorithms covering HEED, DMAC, ANTCLUST base, MEDIC, and EDCR. As mentioned before, these algorithms will produce well-distributed CHs by making a node’s decision to be a CH based on the decisions of other nodes in its neighborhood.
We assume that there are number of uniform-randomly distributed nodes in a given deployment area resulting in a 2-D Poisson point distribution of intensity , where [17] in our analysis. Furthermore we assume that all clusters are well populated; that is, each cluster consists of a large number of less reliable low cost nodes which work collaboratively to achieve reliable results. Hence, where is a random variable denoting the cluster area. According to the DTDC class of algorithms, the area covered by a CH candidacy message is given by , where represents the maximum distance a CH candidacy announcement message would reach. Since ,
The following common features exists in the DTDC class of algorithms.(a)DTDC class of algorithms does not allow two CHs to be within a distance . Furthermore, it ensures that all the nodes are either discovered by a CH (i.e., there is a CH within a distance of a regular node), or the node itself is a CH.(b)Each node calculates a time at which it will broadcast the CH candidacy announcement, provided it has not heard a similar message from a neighbor by this time. calculation is algorithm specific. However, all algorithms ensure that is inversely proportional to the fitness of a node to be a CH. For example in the EDCR algorithm, is inversely proportional to the relative residual energy level of a node [12]. As such, the node with the highest fitness to become a CH will have the lowest , resulting it to announce CH candidacy first and becoming the CH for that neighborhood.(c)All the algorithms use a random component for tiebreaking. Hence, when all nodes are equally fit to be CHs, is purely random. This is true for EDCR, HEED, MEDIC and ANTCLUST algorithms at the initial deployment stage since all nodes have equal energy.
Above features of DTDC class of algorithms reaffirm that the selected CHs represent a dependent thinning point process on the original 2-D Poisson point process. Let represents the set of all deployed nodes, where with . The clustering process yields a random set of secondary points which are CHs with the property that , where and . Note that are the regular (non-CH) member nodes. For any node we have and at least for one CH node . Further, it should be noted that is a member of the cluster with CH when .
According to [19] aforementioned dependent thinning point process follows a Matérn Type III process when is a pure random value. Hence, we can conclude that the CH distribution of dependent thinning algorithms like HEED, ANTCLUST, DMAC, MEDIC, and EDCR immediately after deployment would resemble a Matérn Type III point process.
Example 1. Figure 1 gives a simplified description of Matérn Type III process applied to 3 random nodes , , and with , , , , , , and .
According to this illustration, since , eliminates ; since is eliminated, even though , would not be eliminated; hence, nodes and will be elected as CHs. Even though the above description clearly indicates that the DTDC class of algorithms resemble a Matérn Type III process, we cannot find the resultant CH density (or expected number of clusters) using this information. As Bertil Matérn has shown in [20], the point distribution of Matérn Type III-dependent thinning process is mathematically intractable.
Based on this background, we will derive the CH density of the class of DTDC algorithms by finding the probability density function (p.d.f) of for practical cases satisfying (1) in the next section.
4. Probability Density Function of Cluster Area
Based on our analysis we observe that the probability of depends on the following two scenarios.(1)For a given cluster area, there are no uncovered nodes (uncovered node means a node that has not heard from a neighboring CH almost at the end of a new CH candidacy announcement time interval) in its cluster neighborhood.(2)The chance of having no such uncovered nodes.
Let be the probability that no uncovered nodes exist in a given cluster neighborhood. Then the conditional probability denotes the cluster area given no uncovered nodes existent in a given cluster neighborhood. Based on these facts, we find the probability of a resultant cluster area when no uncovered nodes exists. One finds that where .
We use Figures 2, 3, and 4 to explain (2). Please note that the radius of each disk is in all the figures.
According to the class of DTDC algorithms, smallest possible cluster area size would result whenever a given CH’s neighboring CHs sit on the perimeter of its CH broadcasting coverage disc of radius since no two CHs could be selected within each other’s CH broadcasting range . This situation is shown in Figure 2.
Hence, we can write
In other words, Figure 2 shows the possible highest CH density (Number of CHs in a given unit area). According to the DTDC class of algorithms, we can expect cluster area sizes between smallest of to largest of provided that there are no uncovered nodes in the cluster neighborhood. Therefore, we can write where .
Further, when we have close packed clusters (smallest as shown in Figure 2 and largest as shown in Figure 3), there cannot be any uncovered areas. In other words, when cluster area , there can be uncovered nodes in its neighborhood since there can be uncovered neighboring regions as shown in Figure 4.
represents the probability that there is no uncovered nodes in a given cluster (with area ) neighborhood. This can be expressed by where is any uncovered area formed by the cluster setup as shown in Figure 4. We can show that the neighboring clusters are close packed when the cluster area, . In other words, there is no uncovered area, resulting in for . As a result, the probability that there would not be any uncovered nodes is given by
According to (5), is an exponential decaying function when . Now let us consider Figure 5. This is a special case of Figures 3 and 4 where nodes 0 and 6 are placed distance apart. According to Figure 5, there is a chance for a node to be in the uncovered area shaded in gray. The cluster area of Figure 5 can be expressed as
This is only 3.49% bigger than the size of the cluster area shown in Figure 3. The uncovered area of Figure 5 is where, in general, represents an area of a triangle , and represents an area of a sector . Since , , , , and , we can derive .
We have shown that in (1). Therefore, if we consider a WSN with 100 nodes in a given node neighborhood, then and the resultant . On the other hand, when the neighborhood contains 200 nodes, this will be further reduced to . Hence, we can conclude that where.
Therefore, we can approximate that provided that . Hence, once we combine (2), (3), (4), and (10) we obtain that Therefore,
The resultant cluster areas of DTDC class of algorithms have an equal chance to be in the interval , due to the fact that all nodes having an equal chance to get the lowest as they may have equal fitness to be a CH. This results in cluster area p.d.f, to be uniform. Hence, provided that .
This far we have derived the p.d.f of cluster area . This result will be used in deriving the expected cluster density in the subsequent section.
5. Derivation of Expected Cluster Density
In this section, we will derive the expected cluster density (or CH density as each cluster is served by one and only one CH) for the class of DTDC algorithms.
Let us define as the probability that a randomly chosen node is a CH. Thus,
We note that when and is a random variable with a p.d.f of , then the p.d.f of is given by
We can write the p.d.f of random variable , using (13) as,
According to (14), , where is the total number of CHs at a given moment, and is the total number of nodes. Hence , the expected probability that a given node is a CH, can be given as where is the CH density. So we have
According to (19), we can expect a 0.5018 fraction of nodes belonging to a given CH’s broadcasting range neighborhood to join its cluster.
Further, using (18) and (19), we can show that
Hence, we can conclude that the expected CH density is independent of the node density provided that .
Observation 1. The result obtained in (20) matches with the empirical formula proposed by Bettstetter in [18] where and . When the empirical formula proposed by Bettstetter reduces to
In the analysis thus far we have ignored the influence of the node deployment region boundary and its effects. In what follows, we will analyze the boundary effect. The CHs closest to the boundary does not have any neighboring CHs beyond the boundary; that is, nodes at the boundary have a higher isolation probability even though all the nodes are uniformly distributed within the deployed area. Hence, CHs are more likely to be found at the boundary. This was observed and confirmed in [18].
We can use (17) and (19) to derive the expected number of clusters to be formed assuming that the boundary effect does not exist. In other words, we have relaxed the reality that there can be more CHs close to the boundary compared to rest of the area. Thus, where is the expected number of nodes in any given CHs broadcasting range . That is, in (22), we have not considered the boundary effect. In what follows, we will derive considering the boundary effect for frequently considered node deployment region shapes, namely, a rectangular region and a circular region. Subsequently we will use these results to obtain accounting for the boundary effect.
5.1. Boundary Effect on due to a Rectangular Deployment Area
We derive for a rectangular region with dimensions and ad hoc deployed nodes. For this scenario, the probability () that two uniformly distributed nodes each within CH candidacy broadcasting range is given by the integral where is the p.d.f of the distance between two nodes that are independently and uniformly distributed (at random) in a rectangular area of size , where . According to [21], is given by
Further, when there are N(≫1) uniformly distributed nodes in the deployment region, we can expect nodes in a given CH neighborhood of radius , where is given by
Hence, using (23)–(25), we can derive
Therefore, when (26) is used with (22), we can derive the expected number of CHs. Thus,
As we have already discussed, deriving the CH candidacy broadcasting range for a desired is a salient requirement in most applications. Hence rearranging (27), we obtain that
By solving (28), we can derive for a given ad hoc network setup for a rectangular deployment region with the desired number of clusters, provided that .
5.2. Boundary Effect on due to a Circular Deployment Region
Let us now derive for a circular deployment region. We follow the same approach as in the rectangular deployment region case. Let’s assume that the ad hoc deployed wireless node network consists of uniform randomly deployed nodes in the circular deployment region of radius resulting in .
The expected number of neighboring nodes in a given CH’s CH candidacy broadcasting range , for a circular deployment area with radius of is also given by (25). Note that still the given in (23) is applicable. However , that is, the p.d.f of the distance between two nodes that are independently and uniformly distributed (at random) in a circular area with radius is given by according to [22]. Hence, we can write the of a given circular area with radius as where
Thus, we can determine for a given circular deployment area with radius for an expected number of clusters by solving the reordered (30).
Note 1. We derived assuming that is a random variable. This is true only for the situation where all nodes have equal fitness to be a CH; that is, the residual energies all the nodes are the same. This is in fact true for HEED, ANTCLUST, and EDCR algorithms during initial deployment with the assumption that the sensors are ideal. However in subsequent rounds, would be weighted based on each node’s residual energy level at the beginning of the cluster formation. That is, a node with the highest residual energy would be the CH in a given neighborhood. We know that a node closest to a CH would spend the minimum energy in communication. As a result, it would be the highest energy node in that neighborhood at the beginning of the subsequent CH selection phase. Hence it can be observed that a subsequent round, the CHs would be the nodes closest to the previous CHs. Thus, we can expect on average, the same number of clusters formed in subsequent reclustering rounds as well. As a result, (28) will be valid for all subsequent rounds as well.
In this section, we presented an analytical technique to find the Cluster/CH density of DTDC class of algorithms. Further, we derived the expected number of clusters in a finite area considering the boundary effect. In what follows, we compare the analytical results with simulation experiment results.
6. Simulation Results
In this section, the proposed analytical method to determine the cluster density and expected number of clusters for the DTDC class of algorithms using MATLAB simulations were evaluated. It is already established that the proposed analytical results match the empirical results derived using DMAC algorithm in [18]. For comparison, the simulation results for HEED, ANTCLUST, and EDCR algorithms are presented as well. The results are presented based on the following design scenarios.(1)Design requirement of 20 clusters each with 15 nodes monitoring a square area of . That is, 300 nodes should be deployed in this region. According to (28), the computed broadcasting distance is m to achieve the 20 cluster requirement.(2)Design requirement of 30 clusters each with 20 nodes monitoring a rectangular area of . That is, 600 nodes should be deployed in this region. According to (28) the computed broadcasting distance is m to achieve the 30-cluster requirement.(3)Design requirement of 20 clusters each with 20 nodes monitoring a circular area with radius 200 m. That is, 400 nodes should be deployed in this region. According to the (30) the computed broadcasting distance is m to achieve the 20 cluster requirement.
The simulation results related to above-described scenarios are given in Table 2. H1, A1, and E1 denotes the results of HEED, ANTCLUST, and EDCR algorithm respectively, for scenario 1 (square area). Similarly, H2, A2 and E2 represents the results for scenario 2 (rectangular area) and H3, A3 and E3 represent the results for scenario 3 (circular area). Note that denote the desired number of clusters in each case. The average and standard deviation (AV ± SD) of the actual number of clusters () obtained via a large number of different random node deployment simulations corresponding to each scenario has been tabulated. The tabulated in column “Beginning’’ corresponds to the cluster formation results at the initial deployment stage with a fresh set of homogeneous energy nodes, column “End’’ corresponds to the average number of clusters closer to the end of life of the sensor bed (we used 95% nodes alive as the lifetime measurement [12]), and column “Middle’’ corresponds to an average number of clusters at a position halfway in between the “Beginning’’ and “End’’ scenarios. Further the cumulative average of these three cases is presented in the column “Overall’’.
The results given in Table 2 show us that the analytical estimation for based on cluster requirement is indeed valid as only a minimal variation of is seen in all simulation results. These results (based on HEED, ANTCLUST and EDCR algorithms) and independent simulation results of DMAC algorithm (and its corresponding empirical formula) given in [18] affirm the validity and applicability of the proposed analytical technique in determining the cluster density and the expected number of clusters of DTDC class of algorithms.
As it can be seen from Table 2 all major algorithms in the DTDC class respond in a similar manner. Hence without loss of generality the EDCR algorithm can be selected from this class for further analysis. For the analysis 15 different hypothetical node deployment requirements (case) which would cover the applicability of the analytical method with square, rectangular, and circular deployment regions, with different expected number of clusters for a given deployment region, and different expected number of nodes for a given cluster based on the requirement will be used. These requirements are listed in Table 3. The case number will be used to link the tabulated test results of Table 4 for each of these node deployment requirements. The column given under the heading “Area’’ presents the dimensions of node-deployed region (e.g., for a rectangular region and for a circular region), while the rest of the columns represents the expected number of clusters , expected number of nodes in a cluster , and the total number of nodes to be deployed in the region . The last column presents the calculated for each case using either (28) or (30) depending on the shape of the region.
Table 4 shows the simulation results of the deployment requirements listed in Table 3. Table 4 presents the average and standard deviation (AV ± SD) of the actual number of clusters we observed with the large number of different random node deployments corresponding to each case. The results tabulated in Table 4 indicate that the proposed analytical technique in estimating for a desired number of clusters is indeed an accurate method to realize the actual number of clusters. Furthermore, it can be noted that there is minimal variation in irrespective of the cluster shape (rectangular, square, or circular), desired number of clusters, and the expected member population in each cluster, provided that all clusters are well populated.
The simulation results presented thus far clearly show the applicability of the proposed analytical technique in estimating the expected number of clusters of the DTDC class of algorithms provided that each cluster is well populated, that is, . In order to identify a minimum threshold for or expected number of nodes in a cluster for a given application requirement, the behavior of curves representing average number of actual clusters, versus different node densities, for different CH broadcasting ranges, can be observed. Figures 6 and 7 present these curves ( versus ) of EDCR algorithm applied for a square deployment region with size and a circular deployment region with radius m, respectively. Both of these graphs consist of versus curves for 25, 30, 35, 40 and 45. The expected number of clusters, calculated using (28) and (30) respectively for Figures 6 and 7 and is plotted as a vertical dotted line for each .
Figures 6 and 7 clearly indicate that all the versus curves are asymptotic and close to the expected number of clusters, . The vertical solid error bars marked on each line shows the 5% (short) and 10% (long) levels below the at are 30 and 20, respectively. It has been already identified that 0.5018 fraction of nodes belonging to any given CH’s broadcasting range neighborhood () to join its cluster in Section 5. Therefore, the proposed analytical technique can be used to determine CH candidacy broadcasting range, of DTDC class of algorithms with a maximum error of 10% for a required expected number of clusters, , when the expected number of nodes in a cluster, , is more than 10. The number of nodes in a cluster is well above this figure in most of the practical applications.
Above-presented simulation results and empirical formula derived based on simulation experiments in [18] affirm the accuracy of using the proposed analytical method in determining for given expected number of clusters, of DTDC class of algorithms at the network planning stage.
7. Conclusion
Distributed clustering is a popular technique in organizing ad hoc deployed wireless networks including WSNs. We found that clustering algorithms like DMAC, HEED, ANTCLUST, MEDIC, and EDCR can be categorized into the class of DTDC algorithms based on the common underline Dutch Auction principle in CH selection resulting in a similar CH distribution. In this research, we have provided an analytical framework which can be used to derive the cluster density, , for a given deployment requirement where each cluster is assumed to be well populated. Furthermore, the analysis framework has been extended to include the effects of the boundary resulting from a finite deployment region when computing the expected number of clusters. The proposed analytical technique was verified via simulation experiments, and the results were presented. Further, the empirical formula proposed by Bettstetter in [18] independently verifies the accuracy of the proposed technique and vice versa. The authors feel that this analytical framework can be extended to derive for any generic situation given by Matérn Type III-dependent thinning point process [20] in future research.