This paper describes the design and implementation of a distributed self-stabilizing clock
synchronization algorithm based on the biological example of Asian Fireflies. Huge swarms
of these fireflies use the principle of pulse coupled oscillators in order to synchronously emit
light flashes to attract mating partners. When applying this algorithm to real sensor networks,
typically, nodes cannot receive messages while transmitting, which prevents the networked nodes
from reaching synchronization. In order to counteract this deafness problem, we adopt a variant
of the Reachback Firefly Algorithm to distribute the timing of light flashes in a given time window
without affecting the quality of the synchronization. A case study implemented on 802.15.4 Zigbee
nodes presents the application of this approach for a time-triggered communication scheduling
and coordinated duty cycling in order to enhance the battery lifetime of the nodes.
1. Introduction
In South-East Asia, huge swarms of fireflies synchronously emit light flashes to attract mating partners [1]. This paper describes the adaption of the underlying biological principle for a robust self-stabilizing distributed synchronization in wireless sensor networks.
An ensemble of nodes is synchronized in order to execute a collision-free communication schedule following a time-triggered paradigm [2]. The basic element of a time-triggered system is a global timebase that is distributed among the nodes through clock synchronization. In order to provide a common timebase we propose the application of Reachback Firefly Algorithm (RFA), which is a Firefly-inspired algorithm that works despite the limitations of current radio controllers, which are deaf to incoming transmissions while in sending mode. This deafness problem is mitigated by distributing the timing of light flashes in a given time window. Using the global timebase, communication activities are scheduled according to a predefined, periodic scheme. This simple but robust scheme enables the design of dependable distributed systems and simplifies system verification and diagnosis. Furthermore, the global synchronicity is used to enable synchronized sleep schedules in a wireless network cluster which can save a considerable amount of energy at each node. This is especially useful in situations with low duty-cycles, for example, a sensor network that is utilizing only a fraction of its available bandwidth. Due to the a priori known message schedule, the synchronized nodes are then able to predict the timing of incoming messages and can turn off their receivers when no transmissions of interest are scheduled. Since listening on the channel is a significant energy consumer of a typical wireless sensor node, the overall consumer power can thus be reduced in favor of battery lifetime. The global time can also support the application in tasks like timestamping, synchronous measurements, and timely coordinated distributed actions.
As a proof of concept, the algorithm has been evaluated by simulation and in a case study consisting of a network of battery-powered low-cost nodes based on an off-the-shelf IEEE 802.15.4 MAC layer. The evaluation results in this paper give realistic figures for the precision of the clock synchronization and the achievable savings in power consumption.
The rest of the paper is structured as follows. Section 2 describes the basic features and operation of the RFA. Section 3 presents the design of our approach consisting of clock synchronization, a modified RFA and an energy saving scheme. Sections 4 and 5 describe the evaluation of a case study implementation by simulation and on real hardware. Results are discussed in Section 6. Related work is treated in Section 7. The paper is concluded in Section 8.
2. Reachback Firefly Algorithm
The RFA was introduced in [3] and supports scalability, graceful degradation, and a simple calculation. The algorithm can be classified as a self-stabilizing distributed push-based clock synchronization algorithm. The advantage is that it naturally provides self-stabilization, that is, in any initial configuration, the clocks eventually become synchronized. The concept is based on the Pulse-Coupled Biological Oscillators (PCO) phase advance synchronization model [4], but with the difference that it is more appropriate for the practical implementation in wireless networks. For instance, the following assumptions from the original PCO model make a practical application very difficult. The oscillators have identical dynamics. Nodes can instantaneously fire. Every firing event must be observed immediately. All computations are performed perfectly and instantaneously.
To understand the principle behind the main concept of the PCO model, consider the following simple example: Assume two persons and want to synchronize their wrist watches but can only inform the other one if the own watch indicates twelve o'clock. Let and denote the time of the persons' clocks. Every time a person is notified, it advances the own watch by a factor (in our example ) to at most twelve o'clock. The higher, the multiplication factor, the faster the clocks converge, but the system becomes less robust to faulty notifications then. This algorithm describes the simplified phase advance synchronization model of the fireflies, which is described in more detail in the next section. Based on the initial configuration and , Table 1 shows that after 5 periods the clocks are synchronized.
Table 1: A demonstration of the PCO model. The columns correspond to the ongoing time sequence.
However, in the case all clocks are synchronized, they will indicate the clock event at the same time. Using a broadcast communication medium, this causes message collisions, and a “deafness” problem in many wireless systems, since standard wireless transmitters cannot receive messages while being in transmission mode.
The problem can be bypassed by sending the synchronization messages with a random offset, while transmitting the particular offset with the message. The receiver can then reconstruct the intended synchronization instant and perform a clock adjustment with respect to the received offset values. Obviously, this random offset results in an out-of-order reception of synchronization messages which causes a problem in the case of the simple synchronization approach. A solution to this problem is to gather all synchronization events until reaching the period end and then react to the received time information from the last period. This idea was introduced in [3] and is called reachback response. However, a reachback response variant of the mentioned simple synchronization approach then equals the below described RFA algorithm with and is proven in Lemma 3.7 to be unfeasible for clock synchronization.
The formal description is based on the phase variable . This variable is characterized by (i) , where denotes the cycle period and (ii) at the beginning of a cycle. Let denote the state variable corresponding to the charge of a firefly. The authors of the PCO model have proven that the state function must be a smooth, monotonically increasing, and concave down function in order to achieve synchronicity. In [4], Mirollo and Strogatz have stated a general state function as shown in (1) whereas the form of the curve depends on a parameter named dissipation factor, denoted by , and measures the extent to which is concave down. Figure 1 visualizes the state function for different dissipation factors,
The coupling between the oscillators is defined by the firing function and depends on the state function and the pulse strength , where denotes the inverse state function
The firing function is calculated immediately after an oscillator receives a firing event (or flash in case of a firefly). We further use the term of phase advance to define the increase in the phase domain, denoted by . Due to the concave down state function, a constant addition in the state-domain results in a variable increase in the phase-domain where a phase advance in the beginning of a cycle is smaller than later in the cycle.
Figure 1: The state function dependent on different dissipation factors.
To combat the assumption problems of the PCO model in wireless networks, the RFA additionally uses the notion of a reachback response and pre-emptive message staggering. Pre-emptive message staggering means that a node broadcasts its synchronization message with some random time offset before it reaches the period end and thus is able to gather the time information of all other nodes during a period with a lower probability of message collisions.
In the original PCO model, an oscillator immediately reacts to each firing event. In contrast, the reachback response records the timestamps of all received firing events and calculates an overall phase jump once at the end of each period which is then applied at the beginning of the next cycle. Thus, if a node reaches the period end, it “reaches back in time” and reacts to the firing events of the past period. This principle is visualized in Figure 2.
Figure 2: Comparison of (a) the original PCO model and (b) the RFA. In the PCO-model, an oscillator immediately reacts to a firing event. In contrast, The RFA applies the overall phase jump at the beginning of the next cycle: .
A further problem in the PCO model occurs in the case of an already synchronized network comprising several nodes. If so, all nodes will trigger the transmission event for the synchronization message at the same time. As a result, the messages will collide and the collision avoidance mechanism of the CSMA/CA scheme takes effect. The resulting delay jitter then can be avoided by using MAC timestamping. However, the backoff scheme of the IEEE 802.15.4 standard [5] allows to backoff a message at most times which results in a maximum backoff time of at most 36.48 milliseconds (at 2.4 GHz). Since the serialization delay of a full message is at most 4.256 milliseconds (133 bytes at 250 kbps), there can be at most active wireless nodes without losing messages in the best case. Therefore, a bigger network comprising more than nodes in the same broadcast domain requires an additional message staggering delay at an upper layer. A second reason for the additional message staggering is that the original IEEE 802.15.4 standard does not provide an MAC timestamping mechanism and thus does not allow to reduce the delay jitter due to the backoff scheme. The only way to reduce the delay jitter then was to modify the default values of some MAC specific attributes in order to switch off the backoff mechanism. To avoid the resulting higher probability of transmission failures, the pre-emptive message staggering explicitly adds a timestamped random transmission delay to the firing messages at the application layer.
3. Applying RFA to Wireless Sensor Networks
The principal purpose of many protocols used in sensor networks is aimed at reducing the consumed power through synchronized sleep schedules. Such an approach is also referred to as a low duty-cycle concept where the transceiver module of all nodes is periodically activated only for a short time with a period length from seconds up to hours. Our concept allows to perform duty-cycling in a more effective way by utilizing a time-triggered approach where a node takes advantage of the a priori known transmission events. These events are globally coordinated by the use of rounds stored in a file called Round Description List (RODL) file. In the current implementation such a round corresponds to a complete cycle of our synchronization algorithm. A round is further divided into a number of slots. Every node in a network must have its own RODL file and statically assigns a communication activity to each slot in each round. This allows the setup of a collision-free communication and further improves the energy consumption by switching off the transceiver if it is not required. Figure 3 shows the time diagram of a time-triggered approach for a single node. Therein, a period is subdivided into several slots whereas each slot corresponds to either a receiving slot, a sending slot, an execution slot, or an idle slot. Concerning the energy awareness, the most important slots are the receiving slots since they determine how much energy is spent on listening and receiving. In the diagram, the first and the second slot are assigned to be receiving slots. Note that the active time for the receiver unit differs between these slots. This comes from the automatic deactivation after the receiver has recognized the end of a transmission. The parameter denotes the synchronization window and guarantees that the receiver module is enabled some time prior before any transmission takes place.
Figure 3: The principle of the time-triggered approach.
The time-triggered approach requires the notion of a global time which is provided by the RFA clock synchronization algorithm. Note that the algorithm can only approximate the global time. The best achievable precision in an ensemble of clocks is lower bounded to the convergence function of the synchronization algorithm and the maximum drift offset , where denotes the maximum drift rate of all clocks in that ensemble. This is also known as the synchronization condition . In our approach, the convergence function is defined by the RFA and heavily depends on the maximum delay jitter which is the maximum absolute deviation of the delay a message encounters during the communication.
In order to get promising results, the global time must be approximated with a very high precision. One way is to minimize the drift offset. This can either be done by using high quality crystal oscillators or a more frequent resynchronization. Both approaches have their drawbacks, because in mass production, crystal oscillators would be expensive compared to the cheap internal RC-oscillators in low-cost nodes. Secondly, a shorter period time results in the exchange of more synchronization messages in the same time and thus would affect the energy consumption. Alternatively, the reduction of the maximum drift rate can also be achieved by a rate correction algorithm. In our approach, this algorithm is performed in the digital domain and makes use of the concept of virtual clocks. A virtual clock abstracts the physical clock by the use of macroticks. A macrotick comprises several microticks which are generated by a physical clock. The principle of this concept is to change the number of microticks representing a macrotick in order to adjust the granularity and frequency of the virtual clock. In the current implementation a macrotick corresponds to a complete cycle length. Thus, the duration of the periods can easily be changed by adjusting the threshold value of the physical timer/counter.
3.1. Clock State Correction
The clock state synchronization is established by the RFA model and uses the definition of the smooth, monotonically increasing, and concave down state function of (1) to calculate the overall phase advance . Consider the dissipation factor and the pulse strength within , then the phase advance equals
The direct implementation of all these functions would result in a time-consuming calculation process. Therefore, we simplified the equation by inserting the inverse function in (3). Let and , then (3) can be transformed to
Assuming a strong dissipation factor and a small pulse strength s.t. , then we can replace by the first-order approximation of the Taylor expansion and thus is negligible. The phase advance then can be reduced to
As a result, we have a linear Phase Response Curve (PRC), where the coupling factor specifies the strength of coupling between the oscillators and depends on the product of the dissipation factor and the pulse strength . This result is similar to the simplified firing function described in [3].
In contrast to the original RFA algorithm, our approach achieves a better synchronization precision and a faster convergence time by indirectly performing a clustering of the received synchronization events. This is done by ignoring all events which are within the phase advance of the last event to which a node reacted. In fact, this corresponds to the introduction of a short refractory period. Additionally, we do not allow a node to react to firing events which would originally occur after the node reached the period end. This ensures that in the case of synchronized nodes, the fastest node then does not advance its phase anymore resulting in a better precision. The algorithm is formally analyzed in more detail and guarantees network synchronization as long as the bounds for several parameters are maintained. Algorithm 1 explains the behavior of this extended RFA (E-RFA) algorithm with the use of pseudocode. The refractory period is implemented by the condition in Line 9. The variable eventset contains the correct phase of all received firing messages and denotes the random amount for the preponed transmission with at most the maximum, respectively, minimum message staggering delay , respectively, .
Since the purpose of this work should demonstrate that such a synchronization approach works with an off-the-shelf communication stack without MAC-timestamping, we have to expect a delay jitter in the order of milliseconds due to the uncertainty in the application and MAC layer. It should be mentioned that Lundelius and Lynch have shown in [6] that in the presence of a maximum delay jitter , an assemble of clocks cannot be synchronized to a precision better than .
Lower Bound for the Coupling Factor
We assume that every processor consists of a hardware clock which generates the phase . This clock stays within a linear envelope of the real time. Note that whereas the hardware clock continuously increases, the phase is periodically reset to with respect to where denotes a dynamic offset value which changes due to the state correction algorithm. represents the granularity of the hardware clock which corresponds to the synchronization period . We therefore assume that there exists a positive constant (maximum drift rate) such that . Note that this definition of the bounded drift simplifies the calculation of the precision and may differ from literature. We further assume a fully connected network in which the message delay is always in the range , where denotes the constant part and the maximum delay jitter of the communication delay in real time. The lower bound for in an ensemble of nodes in a fully connected network then depends on the maximum drift rate , the message staggering delay , and the communication delay . Note that all parameters with a preceding are defined with respect to . However, for simplifaction we now always assume that is normalized to . Let , respectively, denote the maximum, respectively, minimum relative message staggering delay and . We now show that in the case of two clocks, the modified RFA provides a bounded precision . Therefore, for denotes the maximum time difference among all nodes in real time units between the time reached and the time reached .
Lemma 3.1. Let and be the drift offset. In the case of two clocks and no message loss, if the coupling factor is lower bounded to and , then for and , Algorithm 1 keeps the network synchronized with a worst case precision bounded to
Proof. Assume the clocks are initially synchronized to . W.l.o.g. let be the faster node. We further use as the reference for the precision , where denotes the real time when 's phase reached . We further assume that the next time reaches the threshold is at time . Let be the corresponding precision at . For we then have and . Let respectively, denote the relative message staggering delay the node , respectively, has calculated for the last transmission. If the last fire event of was at , then with respect to the communication delay received the phase at and consequently adds the offset leading to . Similarly, a fire event from with offset is received by at phase . Let be the minimum, respectively, maximum possible phases of the calculated firing events. If , then it is guaranteed that , respectively, . Since , we have as stated.
Based on the current precision and the phase advance of and at time labeled by and , we are able to calculate the precision the next time reaches the threshold. That is, . However, we have to distinguish between three cases depending on . In detail, if , then and , or if , then also and , or finally if , then due to Line 4 of Algorithm 1 we have and . Note that the overlapping of and is volitional, because if , then both cases can occur and hence must be considered. Further note that the bound of ensures that the interception point of the phase of both nodes is within the last period. In order to keep the clocks within the precision, the inequality must be valid for all three cases. From the first case we get and . From the third case it follows and . Note that is always valid due to the definition of . From the second case, it can be derived that and . Again, ensures that is valid. The worst case precision with respect to these three cases then equals .
Note that the correctness of the proof requires that a node advances its phase at most once per period. However, if , then may initiate a firing event after already passed the threshold. Simply setting avoids this effect.
In order to get the worst case precision, we further have to incorporate the precision (I) and (II) for all three mentioned cases. In detail, for we additionally have to analyze for case if the equation holds and for case , if and are valid. Similarly for it must be ensured that . From these equations we can derive the following additional bounds: , and . Therefore, if we want bounded between , then must hold. Furthermore, in the case of , we have to adapt the worst case precision to which now equals the worst case upper bound, since all possible cases were considered.
Finally, it should be mentioned that the maximum relative message staggering delay must be smaller than . Otherwise, assume the case where both nodes are initially apart. Then both nodes will never perform a phase advance due to Line 4 of the algorithm.
Note that in the case of a fully connected network comprising more than two nodes, all nodes synchronize to the fastest one due to Line 4 and Line 9 of Algorithm 1. Especially the condition in Line 9 ensures if a node advances its phase due to some received firing event , then all events immediately following some short time after are ignored. This condition is necessary. Otherwise, assume nodes are perfectly synchronized. Consequently, a node would perform times a phase advance, which results in a mutual excitation in the case is very large.
Theorem 3.2. Let and be the drift offset. In the case of clocks and no message loss, if the coupling factor is lower bounded to and , then for and , Algorithm 1 keeps the network synchronized with a worst case precision bounded to
Corollary 3.3. If a fully connected network comprises only of perfect clocks () and the communication network suffers from no delay jitter (), then the network keeps synchronized with a precision of , if .
Note that Corollary 3.3 states that it is sufficient that the network is connected.
Corollary 3.4. If a fully connected network comprises only of perfect clocks () and the communication network suffers only from delay jitter (, ), then the network keeps synchronized with a precision of , if .
Corollary 3.5. If a fully connected network comprises of clocks with a maximum drift rate of and the network suffers from no communication delay () and , then the network keeps synchronized with a precision of , if .
Upper Bound for the Coupling Factor
One may ask why not setting such that a node immediately adjusts its phase to a neighboring clock every time receiving a firing message from this clock. However, the following lemmata shows that there exists a basic upper bound which holds for every network.
Definition 3.6. A firing configuration of a fully connected network comprising nodes is defined to be the concatenation of the phase of node at the time when just reached the threshold for the th time and consequently applied the phase advance .
Lemma 3.7. In a fully connected network comprising of perfect clocks, if the coupling factor , then the nodes may never become synchronized.
Proof. The proof is based on the fact that if is too large, then the nodes will infinitely often enter the same firing configuration. Let and be the two participating processors where is the first node reaching the threshold. The initial firing configuration then is with . Next, reaches the threshold leading to with and . The next time reaches the threshold is at with and . Finally again reaches the threshold at with and .
If we assume that , then the phase advance can be reduced to . The same applies to and . Thus, if all three conditions are true, can be redefined to . In other words, the nodes will infinitely often enter the initial firing configuration. We now have to find the lowest where the inequation is valid. Equalizing all three conditions yields and . Thus we get .
Since the algorithm ignores all firing events immediately following some short time after a previous firing event due to Line 9, a node may realize a set of nodes as a single node and therefore Lemma 3.7 also applies to networks comprising more than two nodes. We now exploit the intuition behind Lemma 3.7 and extend this problem to a general network comprising nodes.
Definition 3.8. is called to be an infeasible firing configuration, if there exists a positive integer such that and the network is not synchronized.
Lemma 3.9. The maximum phase advance a node can perform in a fully connected network comprising nodes equals .
Proof. The maximum phase advance occurs if the firing events are at close quarters such that no event is ignored due to Line 9 of Algorithm 1. In detail, assume a node received the firing event at the phases . The first phase advance then equals , where . Due to Line 9 of Algorithm 1, the earliest next time the node performs a phase advance can only be at and equals . Generally, and for . Solving the recursion leads to and thus . Solving the equation for then yields . The overall phase advance thus equals . Since the maximum occurs when , we finally get .
A weak upper bound results from the fact that we do not want a node to perform a phase advance which is greater than and directly follows from Lemma 3.9.
Corollary 3.10. In a fully connected network comprising of perfect clocks, if the coupling factor , then in every feasible execution a node will never perform a phase advance which is greater than .
Note that even if the weak bound is maintained, it can be shown that there exist infeasible firing configurations. However, due to imprecisions in calculations, the varying short-term drift, the delay jitter, and due to several other indeterministic environmental effects, this bound is generally applicable. A stronger bound results from empirical studies which have shown that infeasible firing configurations do not exist, if the maximum phase advance . The resulting bound for again can be deduced from Lemma 3.9.
Theorem 3.11. In a fully connected network comprising of perfect clocks, if the coupling factor , then the nodes will never enter an infeasible firing configuration.
Rate of Synchronization
Theorem 3.15 analyzes the time to sync for the case of two oscillators. The authors of [4] have also analyzed the case of oscillators. However, considering a multihop topology requires a more sophisticated solution. For the following proofs, let and denote the initial phase difference between the clocks and with in network .
Lemma 3.12. The infeasible firing configuration with and is a unique fixpoint and has a phase difference of .
Proof. If we set , we get and and thus and .
Although this fixpoint is a repeller, the roundoff error in the calculation may cause a node to enter the fixpoint. This is especially a concern if the granularity of the hardware clock is very low. The rate of sync with respect to different initial phase differences is visualized in Figure 4. It is obvious that there exists a special initial configuration which causes the network to enter this fixpoint. To analyze this initial configuration, we first transform the recursion of the dynamic system into a closed term.
Figure 4: The rate of sync for different initial configurations with .
Lemma 3.13. The phase difference of for equals
where , , , , , , and from Lemma 3.12.
Proof. Let be the initial firing configuration with where and . The phase difference when reached the threshold for the th time is . From Lemma 3.7 we know that with and . If we substitute for and consider the phase difference of , we get and which yields for . The dissolving of the recursion is left to the reader and leads to the solution as stated.
Lemma 3.14. There exists a unique initial phase difference where the network eventually enters the fixpoint of Lemma 3.12 and equals with from Lemma 3.13.
Proof. If the network enters the fixpoint in at some , then we have a phase difference of for with from Lemma 3.12. Using (8) then yields . Since we get and thus . Using and from Lemma 3.13 results in . The initial phase difference then has to be as stated.
Theorem 3.15. The number of iterations until synchrony is at most with , and from Lemma 3.13 and
Proof. Note that either converges to or ) as visualized in Figure 4. Therefore, we simply equate (8) with if or with if . Since for and the multiplicative factor is smaller than , the term with respect to does not influence the rate of sync for larger and hence can be neglected. This leads to the equation as stated.
3.2. Clock Rate Calibration
The concept of clock rate calibration combats the problem of frequency deviations due to the high clock drift of the RC-oscillators usually used in low-cost devices. This approach should allow a longer resynchronization interval with the same synchronization precision. Note that the rate correction can be performed completely independent from the clock state correction scheme.
The core concept of our rate calibration algorithm is that a processor implements a virtual clock which abstracts the hardware clock . The algorithm implemented on then only reads the time from the . We further denote the ticks from the by microticks and that from the by ticks. One tick of comprises several microticks which we denote by . By adjusting , the time duration of one tick can be increased or decreased. Let be the nominal threshold level and the absolute adjustment value s.t. . Note that the corresponding relative adjustment value for equals . In order to perform the rate calibration, every processor periodically broadcasts a synchronization messages . Let , respectively, denote the timestamps of when transmitted and , respectively, the timestamps of when received from . Let be the th message received from and the th message broadcasted. We further assume that is not received at some before is received for . The dependency between the virtual and the hardware clock with respect to some message is characterized as . Note that we assume that the hardware clock is a linear function of real time within a sufficient long period of time. In contrast, the virtual clock is periodically reset with respect to the resynchronization interval. This assumption is required in order to realize a pulse synchronization scheme.
The rate correction algorithm works as follows: based on the timestamp stored in