Research Article | Open Access
Zhenjie Ma, Haoran Wang, Ke Shi, Xinda Wang, "Learning Automata Based Caching for Efficient Data Access in Delay Tolerant Networks", Wireless Communications and Mobile Computing, vol. 2018, Article ID 3806907, 19 pages, 2018. https://doi.org/10.1155/2018/3806907
Learning Automata Based Caching for Efficient Data Access in Delay Tolerant Networks
Effective data access is one of the major challenges in Delay Tolerant Networks (DTNs) that are characterized by intermittent network connectivity and unpredictable node mobility. Currently, different data caching schemes have been proposed to improve the performance of data access in DTNs. However, most existing data caching schemes perform poorly due to the lack of global network state information and the changing network topology in DTNs. In this paper, we propose a novel data caching scheme based on cooperative caching in DTNs, aiming at improving the successful rate of data access and reducing the data access delay. In the proposed scheme, learning automata are utilized to select a set of caching nodes as Caching Node Set (CNS) in DTNs. Unlike the existing caching schemes failing to address the challenging characteristics of DTNs, our scheme is designed to automatically self-adjust to the changing network topology through the well-designed voting and updating processes. The proposed scheme improves the overall performance of data access in DTNs compared with the former caching schemes. The simulations verify the feasibility of our scheme and the improvements in performance.
Delay Tolerant Networks (DTNs)  are proposed as a network architecture to address communication issues in challenging environments where network connectivity is subject to frequent and lasting disruptions. DTNs consist of a number of communicating devices which contact each other opportunistically so that only intermittent connectivity exists in DTNs. As a result, DTNs are normally characterized by unpredictable node mobility, high forwarding latency, and the lack of global network state information. To address the data access in DTNs, “carry-and-forward” mechanism is used for data transmission in DTNs. When transmitting data, each mobile node acts as a relay to store the passing data and forward the data when contacting other nodes.
DTNs have been introduced into many recent applications. For instance, users with personal mobile devices utilize mobile P2P networks to share data in a certain area . DTNs can also be used in mobile military communication networks to deliver real-time battlefield information locally . In these applications, mobile nodes not only request data but also generate data themselves. Meanwhile, all mobile nodes are also responsible for forwarding passing data. It is necessary to design an appropriate network scheme to coordinate all mobile nodes to cache and forward data in DTNs.
Different caching techniques have been widely used to improve data access performance in many studies. The basic idea of caching techniques in networks is to store data at appropriate locations such that future requests for the data can be replied promptly. Although caching techniques have been extensively studied in traditional networks and wireless networks, they are rarely applied to DTNs considering the harsh network environments. The intermittent connectivity leads to the difficulty of determining the appropriate set of caching nodes in networks. And it is also hard to determine and maintain a set of caching nodes to adapt to the changing network topology given the unpredictable node mobility.
To address the data caching and access problems in DTNs, a number of caching schemes have been proposed. Some of the schemes  select a set of caching nodes based on probabilistic metrics, and others  use Markov chain to predict the nodes mobility and reduce data access delay. However, most existing schemes are still less practicable since the required global network state information is hard to obtain and may consistently change in DTNs.
In order to overcome the aforementioned issues in DTNs, we propose a novel data caching scheme based on the distributed learning automata. Our basic idea is to select and maintain a set of caching nodes called Caching Node Set (CNS), which caches data and addresses data requests from other nodes. When nodes generate new data, new data will be disseminated to all caching nodes intentionally. Besides, CNS can be updated in real time to adapt to the network topology changes via learning automata mechanism. With the updating of CNS, data will be redistributed among the latest caching nodes by cache replacement strategies. The major contributions in the paper are listed as follows:(i)We propose a novel caching scheme to coordinate multiple caching nodes for addressing data access and to improve the overall performance of networks in DTNs.(ii)We propose a novel algorithm utilizing distributed learning automata to construct the optimal Caching Node Set (CNS) to address the network topology change in DTNs. To maintain CNS in real time, two well-designed processes, voting process and updating process, are introduced to our scheme.(iii)We propose a distributed algorithm requiring no global network state information to fulfill our scheme, which improves the practicability of our scheme for DTNs-bases applications.
The remainder of the paper is organized as follows. Section 2 describes the basic idea of our approach; Section 3 describes design and implementation of learning automata based caching node selection; Section 4 describes cache based data access; Section 5 presents our evaluation; Section 6 reviews related work; Section 7 concludes the article; and Section 8 gives a glimpse of further works in this area.
2.1. Problem Statement
DTNs can be described by network contact graph , where each vertex in represents a network node in DTNs at time and each edge represents a contact between node and node at time . Due to the node mobility, some nodes may leave the network while new nodes may join; thus changes along with time. Similarly represents opportunistic contacts in DTNs and changes along with time as well, so edge only exists when the node pair contacts each other at time ; otherwise edge does not exist.
We assume that when node and node contact each other at time , that is, edge exists at time , data can be forwarded from node to node . The network model and an example of data transmission in DTNs are shown in Figure 1. The data are assumed to be transmitted from node to node .
(a) Time a
(b) Time b
(c) Time c
(d) Time d
In Figure 1(a), at time , there are six nodes in the network, , , , , , , and . And there are three edges, , , and , which indicates that only these three node pairs can transmit data in the network.
In Figure 1(b), at time , node generates new data and prepares to transmit it to node . Since there is no direct end-to-end path from node to node at time , node has to forward the data to node that relays the data to node . Node has to cache the data because it cannot contact other nodes and forward the data immediately. Meanwhile the network topology changes and node leaves the network.
In Figure 1(c), after a while, at time , node moves and contacts node . Then node forwards the data to node and node will cache the data until it contacts other nodes.
In Figure 1(d), at time , node contacts node , then node forwards the data to node , and the data transmission finishes. Meanwhile node may move and join the network again.
Therefore, to improve the performance of data access, we need to find a shortest path between the requesting node and the hosting node. Here the shortest path means the path with the highest data delivery probability, which is implemented through the contacts between the nodes along this path.
2.2. Learning Automata
A learning automaton  is a self-adaptive unit designed to achieve certain goals by learning through repeating interactions with outer random environments following a predefined sequence of rules. In the learning process, the learning automaton chooses the optimal actions from a finite set of actions based on a probability distribution in each instant. For each action taken by the learning automaton, the environment will respond with a reinforcement signal. Then the learning automaton updates its action probability vector according to the feedback signal. The relationship between the learning automaton and outer random environment is shown in Figure 2. The objective of learning automata is to evolve progressively to a desired state.
Learning automata can be classified into two main structures: fixed structure learning automata and variable structure learning automata. Learning automata with variable structure are represented by a triple where denotes the set of actions, denotes the set of feedback signals, and denotes the learning algorithm. The learning algorithm is a recurrence relation used to modify the state probability vector. Let and denote the action chosen at time and the action probability vector on which the chosen action is, respectively, based. The recurrence equation for updating the action probability vector is shown by (1) and (2).
when the taken action is rewarded by the environment (i.e., ) and
when the taken action is penalized by the environment (i.e., ). is the number of actions from which the learning automaton can choose.
and in the recurrence equations denote the reward and penalty parameters. If , the learning algorithm is called linear reward-penalty () algorithm. If , it is called linear reward- () penalty algorithm. And if , it is called linear reward-inaction () algorithm .
Distributed learning automata consist of a number of learning automata which form a network and achieve a global optimal result cooperatively. Each learning automaton in the network updates its own action probability vector based on the feedback signals received from neighboring learning automata.
2.3. Main Idea
Our basic idea is to let the nodes decide whether to cache data automatically using learning automata mechanism. The nodes that cache the data are selected as caching nodes and construct the Caching Node Set (CNS) cooperatively. In our proposed scheme, each network node is assigned a learning automaton. When DTNs start operating, all learning automata in nodes are activated and start constructing the CNS to cache data and address data requests based on the node state information. Following rules in the learning automata algorithms, a number of easily accessed nodes in DTNs will become the caching nodes and comprise the CNS together. And more importantly, to address the changing network environments in DTNs, CNS can update itself to adapt to the network topology changes in real time utilizing the learning automata.
The construction and maintenance of CNS are shown in Figure 3. For each node, the action set of learning automata includes two actions, setting as a CNS node and setting as a non-CNS node. At the beginning, the node chooses not to be a CNS node. After that, a set of nodes making frequent contacts with other nodes is selected as CNS, which corresponds to the gray nodes in Figure 3(a). All other nodes in DTNs are able to contact a node in CNS at least once in a period of time with high probability. Node and node are selected as caching nodes, and arrow lines indicate the opportunistic contacts among nodes. It is necessary to ensure that nodes , , , , , and can contact node or node in a certain period of time. When networks topology changes along with time, the CNS is updated to cover all nodes in network as much as possible. In Figure 3(b), with the mobility of nodes and variability of contacts among nodes, node and node are selected as new caching nodes while node ceases to be a caching node.
When CNS is constructed, we focus on utilizing the available node buffer to improve the performance of data access. When a node generates new data with the globally unique identifier and a finite lifetime, the data copies will be disseminated to all caching nodes as quickly as possible. When a node requests for data, it will send the queries to neighboring caching nodes to pull data. If the caching nodes have the data copies other nodes request, the request are replied; otherwise the request fails.
The dissemination of data is shown in Figure 4(a), where the dash lines indicate the transmission of data. New data are generated by node and then disseminated to all caching nodes using the data forwarding strategy. When CNS changes or the nodes deplete the storages, cached data will be replaced and redistributed to ensure data accessibility. The data replacement is shown in Figure 4(b). When node ceases to be a caching node, data cached in node are removed and redistributed to other caching nodes. The overall performance of a caching scheme relies on prompt data dissemination and optimal data replacement strategies.
3. Learning Automata Based Caching Node Selection
In this section, we discuss how to construct and maintain CNS based on distributed learning automata.
To introduce our new scheme, we first make a preview and start with the presentation of several new concepts and terms used in our scheme.
The vote is a kind of message sent from one node to another in our scheme. The votes used in our scheme are assigned different weight values, which are used to select the caching nodes in networks. There are totally VOTE_K different possible weight values which can be assigned to all votes. According to expected utility principle, the weight values of votes can also be regarded as the expected quantity of transmitted data. The weight values of votes are decided by the frequency of node contacts and , referring to the sum of the votes weight value a node receives.
The VOTE_K different possible weight values indicate that there are VOTE_K different possible choices of nodes to which a node can transmit the data ordered by the vote weight values from high to low. When transmitting data, the node prefers to transmit the data to the node which receives the greatest vote weight value. If node is unreachable or disabled, the node would transmit the data to the node with the second greatest vote weight value and so on. In this way, we can improve the successful rate of data access, decrease the loss of data, and reduce the transmission delay.
Besides direct contacts between two nodes, data in DTNs may be transmitted from the starting node to the destination node through multiple contacts. Therefore, we present two different types of votes, direct votes and indirect votes. When node votes for node , the direct vote is the vote that node receives from node and the weight value of the direct vote is added to of node directly. And the indirect vote is the vote node relays to another neighboring node through node . Generally, node is unaware of the vote from node until node sends it to node . Therefore node has to cache the indirect vote until node contacts node . When receiving the indirect vote, node adds the weight value of the indirect vote to .
Voting plays an essential part in caching nodes selection. However, only direct voting between two nodes in contact is inadequate; we propose indirect voting to refine the scheme. The main purpose of indirect voting is to intentionally redirect the votes weight values to the nodes that tend to be caching nodes, which increases the difference of values of all nodes and makes it more conducive to selecting an appropriate Caching Node Set. Otherwise, the votes may be dispersed among all the nodes uniformly. The uniform distribution of values may increase the difficulty of caching nodes selection and the instability of the Caching Node Set, causing the loss of network performance.
3.1.2. Node Information
Beside the passing data caching nodes need to store to improve data transmission; to implement our scheme, each node also needs to cache necessary and relevant node information to build and maintain our scheme. The following node information is cached in each node:(i) indicates the sum of the votes weight value a node has received.(ii) indicates the action probability of setting a node as a caching node. Correspondingly, indicates the probability of setting a node as a noncaching node. Whether a node is a caching node or not is finally determined by the value of .(iii) records the frequency of contacts with other nodes in history.(iv) records node information of other nodes.(v) records the indirect votes’ values for other nodes.(vi) sorts the neighboring nodes ordered by the frequency of contacts.(vii) records the greatest value of all neighboring nodes.(viii) records the node ID with the greatest value of all neighboring nodes.(ix) records the frequency of contacts with the node .(x) serves as an indicator of whether the node is a caching node.
More specifically, a node allocates a queue to store all necessary information of each contacted node, including the node information , the frequency of contacts , and the indirect vote information to the node . Ideally, all the relevant information can be stored in the queue permanently; however, a network node only provides limited memory for information storage. Therefore, the restricted memory allocation for the queue leads to eliminating obsolete information stored in queue. We first assign a time stamp to each queue item; the time stamp of the certain queue item storing the contacted node information is updated every time the node make a contact. The queue always eliminates the items with the outdated time stamps when the memory has depleted. On the other hand, each queue item has a predefined lifetime and will be removed when it expires.
3.1.3. Information Delivery
A node will transmit the node information when contacting other nodes besides voting. When node contacts node , node delivers the node information of its own to node . When receiving the information, node would record and update the cached information of node . When delivering the node information, node increases the by . is the message containing the node information and contains the following details:(i) indicates the total weight value of votes a node receives.(ii) records the greatest value of all neighboring nodes.(iii) records the node ID with the greatest value of all neighboring nodes.(iv) records the frequency of contacts with the node .(v) indicates whether the node is a caching node.
When receiving the message from node , node records the information of node in . Moreover, node updates its value according to the received information. The direct votes’ value from node is added to and the indirect votes from node to other nodes are cached in . Node updates and as well according to the value of node . , and is updated as the node ID which has the greatest value.
The proposed CNS selecting process based on learning automata starts simultaneously with the network operation and is aimed at selecting a set of nodes as caching nodes to construct CNS and updating CNS in real time to adapt to the change of network topology. It consists of two processes: voting process and updating process, as shown in Figure 5.
The action set of the node includes two actions, setting as a CNS node and setting as a non-CNS node. The node selects its action based on value. The initial is set to 0.5 for each node. Once the action is selected, some nodes become the CNS node. It affects the following voting process. The votes that the nodes get change and the state values also change. This leads to new reinforcement signal and new rewarding/penalizing parameter and then new action probability .
In the voting process, when node makes a contact with node , node starts a voting process which sends the vote of different weight values to node according to the node information and connection conditions between each other. When receiving the vote from node , node updates its node information. Meanwhile, node sends a message containing its own node information to node in the voting process as well. And when receiving the message from node , node updates the cached node information of node . When delivering message, node finishes the voting process. Node finishes the voting process when receiving the message and then updates its own node information.
In the updating process, every node in the network updates its own node state information and value according to the cached node information of neighboring nodes. Each node repeats the updating process at a predefined time interval independently. Equipped with a learning automaton, each node updates the action probability vector based on the reward-penalty algorithm and reinforcement feedback from other nodes. Each node will decide whether to be a caching node based on the action probability vector before finishing the updating process.
These two kinds of processes occur independently. The voting process occurs whenever the node contacts other nodes, and the updating process repeats at a predefined time interval independently. One kind of process is affected by the feedback information from the other. The updating process updates the node state information according to the votes and information messages received in the voting process, while the voting process decides the vote values according to the node state changes in the updating process. The time line of one single node is shown in Figure 6 where indicates that the node makes a contact with another neighboring node and that a voting process occurs. INTERVAL_TIME represents the time interval at which the updating processes occur.
3.3. Voting Process
The voting process occurs whenever a contact happens. When node contacts node , the pair of nodes starts a voting process. A voting process consists of two parts, sending votes and delivering node information. However, the voting processes which happened in node and node are different. Node , which starts a contact, votes for node and delivers the node information to node , while node receives the votes from node and updates the stored node information of node .
A voting process is shown in Figure 7 where node contacts and votes for node . Generally, three node, node , node , and node , are involved in a voting process. And we assume and . The solid lines indicate contacts between two nodes in history, while the dashed line indicates possible contacts between each other. The values including , , and on the lines indicate the accumulative number of contacts in history. And indicates the direct vote and indicates the indirect vote; the directions of arrows indicate which nodes vote and which ones receive votes. Generally speaking, the nodes prefer to give their votes to the nodes that have a high possibility (depicted as state) to be CNS node and they contact a lot.
According to the difference of the node information, there are mainly three different scenarios where the distribution of vote weight values is different.
Scenario 1 (the relationship between and is uncertain). In the first scenario, node fails to realize the presence of node ; that is, contains no information of node . It is mainly because that node has never made a contact with node and node receives no message and data from node . In other words, node is unaware of and . This scenario is shown in Figure 8.
This kind of scenario usually happens in the initial time of networks, when there are only a few contacts between nodes and nodes lack information of neighboring nodes. Along with the operation of networks and the dissemination of nodes information, the occurrence of the first scenario will decrease.
In this scenario, when node contacts node , node increases the by , which records the accumulative number of contacts from node to node in history. Then we determine the total votes’ weight value. We assume the place of node in is . Finally the total votes weight value of vote from node to node is if ; otherwise .In the first scenario, the vote sent from node to node is the direct vote alone and the value of the indirect vote is . Node sends the vote and the node information to node , while when node only sends the node information. The equations for the distribution of votes’ values are shown by (3).
Scenario 2 (). When node is aware of the node information of node , which is contained in , the relationship between and is clear and node is also aware of the node information of node , which is represented by , referring to the node that has the greatest value in the neighborhood.
In the second scenario, we assume . Similar to the first scenario, the total weight value of votes is determined at first. Then the distribution of the direct vote weight value and the indirect vote weight value can be decided according to the accumulative number of contacts between nodes.
Node has the greatest value of all neighboring nodes; therefore the relationship of values among node , node , and node is shown According to (4), the second scenario can be classified into a number of following subscenarios.
Subscenario 2.1 (). If node is in , it indicates that node has contacted node before and the number of contacts is relatively large. According to the contacts history, we can predict that node tends to contact node in the future. Thus, node is more inclined to have the data cached in node and the data transmitted from node to node also tends to be forwarded to node and cached. So the contact between node and node can be also considered as a contact from node to node and at the same time, , which records the number of contacts from node to node , is also increased by . This situation is shown in Figure 9. Then the distribution of the direct vote value and the indirect vote value can be determined.
The value of the direct vote sent from node to node is shown by The value of the indirect vote relayed from node to node through node is shown by It can be argued that the value decides the directions in which data are transmitted and the votes’ value indicates the expectations of the quantity of transmitted data.
If node is not in , it indicates that node hardly or never contacts node . Thus remains unchanged and meanwhile the distribution of the direct vote value and the indirect vote value can be determined. This situation is shown in Figure 10.
The value of the direct vote sent from node to node is shown by The value of the indirect vote relayed from node to node through node is shown by Subscenario 2.2 (). This subscenario is shown in Figure 11. According to the relationship between node and node , it can be classified into two situations:(i). The value of node is the greatest in the neighborhood; that is, .(ii). The value of node is the same as that of node , which means .Since is always equivalent to in these two situations; node would tend to have the data cached in node rather than in node in this contact. The distributions of vote values in these two situations are the same. When the total vote value is decided, the distribution of the direct vote value and the indirect vote value can be determined. The equations for the distribution of votes’ values are shown by
Scenario 3 (). Like the second scenario, in the third scenario, node has contacted node and is aware of the node information of node . According to the relationship between and , it can be classified into following situations:(i). Node and node are two different nodes.(ii). Node and node refer to the same node or are different nodes but of the same value.(iii) will be updated as node ; that is, node and node refer to the same node.The relationship of values of node , node , and node is shown by According to (10), the third scenario can be classified into a number of following subscenarios.
Subscenario 3.1 (). According to the relationship between node and node , it can be classified into two situations.
If node is in , this situation is shown in Figure 12. Similarly, , which records the number of contacts from node to node in node , is increased by . When the total value of votes is decided, the distribution of the direct vote value and the indirect vote value can be determined. The equations for the distribution of votes’ values are shown by If node is not in , this situation is shown in Figure 13. It indicates that node hardly or never contacts node . The equations for the distribution of votes’ values are shown by Subscenario 3.2 (). According to the relationship between node and node , it can be classified into two situations as well. It is shown in Figure 14.(i). The value of node is the greatest in the neighborhood; that is, .(ii). The value of node is the same as that of node , which means .The distribution strategies of vote value are the same. When the total value of votes is decided, node only adds the direct vote to itself rather than votes for node or node . So the equations for the distribution of votes’ values are shown by
The proposed algorithms used in the voting process are shown by the following pseudocode in detail. The voting process occurs in node which sends the votes as shown in Algorithm 1 and the voting process occurs in the node which receives the votes as shown in Algorithm 2.
3.4. Updating Process
Each node repeats the updating process at a predefined time interval INTERVAL_TIME independently, during which each node updates its own node state information based on the information obtained from other nodes. In the updating process, nodes will update the action probability vectors and determine whether to set themselves as caching nodes. The updating process fundamentally depends on the learning automaton assigned to each node.
As mentioned, represents the action probability of setting a node as a caching node. Each learning automaton updates the action probability following the learning automata mechanism and the reinforcement signal . Then the state of every node is determined by the action probability and the predefined threshold value of action probability DOR_THRESHOLD. The updating process is shown as follows.
First of all, the node suspends all contacts with other nodes before starting an updating process. Based on the neighboring nodes state information recorded in , the node calculates the average value of neighboring caching nodes and the average value of neighboring noncaching nodes . Meanwhile, nodes also get informed of whether a neighboring node is a caching node according to the value stored in .
The reinforcement signal is shown by (14), where represents the number of neighboring noncaching nodes. If the node gets the votes ) and is surrounded by noncaching nodes, it is rewarded because it may be a good caching candidate; otherwise it is penalized since it is surrounded by a caching node already.
According to the reinforcement signals, nodes start to update the action probability . All learning automata reward the action if whereas they penalize the action if . Let be the action probability at instant .
When the action is rewarded, the recurrence equation is shown by
And when the action is penalized, the recurrence equation is shown by
denotes the reward and penalty parameter and determines the amount of increases and decreases of the action probabilities. is shown by and
If the state of a node is 0, this node does not get any vote during the last updating period. In this case, the node cannot get the reward by making the value of same as before. If penalized the value of is set to 0. So the possibility of becoming a CNS node will be reduced significantly. Otherwise, this node does get some votes during the last updating period, and the reward is decided by the ratio of its state value to the maximum state value of its neighbor. It means the node that gets the most votes among its neighbors will be rewarded most.
The node state is determined by and DOR_THRESHOLD. If , the node is set as a caching node; otherwise the node is set as a noncaching node. It is different with the classical learning automata selecting its action according to its action probability vector. Therefore, the nodes that are not suitable for caching data get no chance to become CNS nodes. It leads to significant performance loss, especially in the beginning period and the period when the node contacting pattern changes. To address this issue, we modified the classical learning automata to our version to make only suitable nodes become CNS node. It may cause suboptimal solution. However, it can find a reasonable solution quickly and may be more appropriate for dynamic DTNs.
Finally, the node removes the stored node information items whose lifetime has expired from the buffer. When the node state is determined and the obsolete information is eliminated, the node resumes working and contacting other nodes. Then the updating process is finished. And CNS is comprised of all the caching nodes in the network and might change along with the network operating. More details about the updating process are shown by the pseudocode in Algorithm 3.