Abstract

Along with the development of realtime applications, the freshness of information becomes significant because the overdue information is worthless and useless and even harmful to the right judgement of system. Therefore, The Age of Information (AoI) used for marking the freshness of information is proposed. In Internet of Medical Things (IoMT), which is derived from the requirement of Internet of Thins (IoT) in medicine, high freshness of medical information should be guaranteed. In this paper, we introduce the AoI of medical information when allocating channels for users in IoMT. Due to the advantages of Deep Q-learning Network (DQN) applied in resource management in wireless network, we propose a novel DQN-based Channel Allocation (DQCA) algorithm to provide the strategy for channel allocation under the optimization of the system cost considering the AoI and energy consumption of coordinator nodes. Unlike the traditional centralized channel allocation methods, the DQCA algorithm is distributed as each user performs the DQN process separately. The simulation results show that our proposed DQCA algorithm is superior to Greedy algorithm and Q-learning algorithm in terms of the average AoI, average energy consumption and system cost.

1. Introduction

Corona Virus Disease 2019 (COVID-19) has caused more than 2.32 million deaths worldwide by February 8th, 2021 [1]. People are forced to stay at home, reduce the trip proportion, and avoid to go to crowded places. In this case, both the government, medical staff, or the general public hope to monitor virus infections like COVID-19 and isolate them in time to avoid the spread of the virus on a large scale. Besides, people are more concerned about their health than ever before. More and more chronic patients and even healthy people hope to have long-term effective monitoring of their bodies and obtain important information about their health as soon as possible. The emergence of the Internet of Medical Things (IoMT) has provided the possibility to solve these problems, and its intelligent monitoring function has gained massive demand around the world [2].

For the COVID-19 virus, Swati Swayamsiddha et al. proposed a Cognitive Internet of Medical Things (CIoMT), which is a particular case of the IoMT, enabling real-time tracking, remote monitoring of patients, rapid diagnosis, contact tracing and clustering, screening and monitoring, etc., thus reducing the workload of medical staff and preventing and controlling the spread of the virus [3]. RaviPratap Singh et al. discussed the feasibility of using the IoMT to track, monitor, analyze data, and provide treatment plans for orthopedic patients in an environment ravaged by COVID-19 [4]. For COVID-19 management, M.A. Mujawar et al. also proposed a health monitoring system based on wearable devices and artificial intelligence, which continuously monitors the patient’s heartbeat, body temperature, and other parameters through medical sensors and transmits them to cloud storage through WSN. At the same time, these parameters are used to update the user’s health status in real time and then the status will be sent to the medical staff [5].

The IoMT is a vast network system with diverse technologies. This paper only studies the channel allocation problems in the monitoring and transmitting human physiological data in the IoMT. During the monitoring and transmission, too old data may cause erroneous analysis and evaluation, reduce the accuracy and reliability of system decision-making, and even threaten the safety of users. Therefore, the freshness of information is crucial, and it also occupies an essential position in the design of 6G systems applied to body area networks [610]. To effectively describe the freshness of information, this paper introduces the Age of Information (AoI) [11], and studies the channel allocation problem of IoMT with AoI as the target.

In recent years, artificial intelligence has become an effective method to solve the resource allocation problem with many data processing [12]. As the main solution of artificial intelligence, machine learning has also received tremendous attention in recent years. Machine learning uses algorithms to analyze and learn from data to make decisions and predictions about real-world events. Among them, deep learning is the most popular machine learning method at present, which has been well applied in automatic detection [13, 14], case recognition [1517], environmental monitoring [18], and epidemic prediction [19], etc. In terms of channel allocation, with the rapid growth of network size and data volume, deep learning can significantly improve the processing speed for a large number of nodes [2023].

The research content of this paper is the problem of channel allocation among users oriented to the optimization of the AoI. The AoI of each controller on each user’s body at the gateway is the number of slots experienced by the latest update received from this controller at the end of each slot. In each time slot, the system needs to pay for the AoI. our requirement of timely updating the content received by the gateway is reflected in the minimum payment cost of the whole system. At the same time, this paper adopts a deep learning method to solve the proposed optimization problem. The main contributions of this paper are as follows: (i)In view of the channel allocation problem of the IoMT, we focus on the timeliness of the information, and at the same time, considering the mobility of nodes. To measure the cost that the system pays for the lack of new information on gateway, we propose a system cost function based on the AoI and the current energy consumption rate of the nodes.(ii)Based on the cost function, we constructed a mathematical model of the optimization problem that minimizes the average cost for the channel allocation of the IoMT.(iii)For the problems raised, we propose a Deep Q-Learning Network (DQN) based channel allocation algorithm, named DQCA, which provides channel allocation scheme to minimize the cost on the basis of meeting the requirements of node SNR and residual energy.

The rest of the paper is organized as follows. Section 2 provides a comprehensive overview about the AoI. Section 3 describes the system model and optimization model of channel allocation problem in IoMT. The proposed DQCA algorithm is illustrated in Section 4. The simulation and performance evaluation is performed in Section 5. Finally, we conclude the paper in Section 6.

With the increasingly developed Internet of Things (IoT), real-time applications are gradually increasing, such as driverless cars, which make decisions and control based on road information detected by sensors, adjust the travel mode of vehicles, avoid collisions, and ensure the safe driving of driverless cars. This type of application requires high timeliness and freshness of data, and outdated data will lead to wrong judgments and decisions. The longer the time, the less important and effective the data will be. In order to measure the freshness and effectiveness of data, scholars put forward the indicator of the AoI in 2011 to quantify the freshness of information on a remote system state [11]. The AoI refers to the time elapsed between the creation of the newly successfully received information and its successful reception. The AoI is different from the transmission delay of information. In a system with multiple source nodes and one destination node, each source node collects information and sends it to the destination node regularly. At the destination node, the AoI of each source node can be calculated [24]. Since the source node is constantly sending information to the destination node, the AoI of each source node refers to the AoI of the latest information received by the destination node from that source node. In other words, the AoI of each source node is not fixed and depends on the sending rate of the source node and the receiving rate of the destination nodes for source node’s information. If the destination node has not received the latest information from a certain source node, then the AoI of the source node will show a linear increase until it gets the newest information from the source node and changes to the AoI of the latest information.

As shown in Figure 1, is the time that data packet i is generated by node j, is the time that data packet i is received by the destination. When t =0, the destination node receives a data packet 0 from node j, then . Then increases linearly until the destination node receives a latest data packet 1 at . At this time, is updated as . Like this, we can deduce that when the destination node receives a latest data packet 2 at , and so forth.

The Swedish scholar Antzela Kosta et al. published a review paper on the AoI in 2017, introducing the concept of AoI in detail and summarizing the early researches [25]. Jhunjhunwala P R et al. proposes an AoI-aware channel scheduling algorithm for a sensor network with a monitoring station and multiple source nodes. The algorithm proposes that the cost function is a non-declining function, but it does not provide a completely function and optimization model [24].

There have been some researches on the AoI in the IoT. Abbas Q et al. studied the importance and optimization of the AoI and energy efficiency in the IoT [26]. Gu Y et al. studied the average peak AoI under two schemes of overlay and underlay in a cognitive radio-based IoT network [27]. Li J et al. studied the average peak AoI of time-limited multicast transmission in the IoT. The author first describes the evolution of the instantaneous AoI and then derives the service time distribution of all possible reception results on IoT devices, and obtains the closed expressions of the average AoI and the average peak AoI [28]. Azarhava H et al. proposed a new protocol based on non-orthogonal multiple-access (NOMA) in a wireless IoT network with energy harvesting sensors and limited battery cells. A closed-form equation of the AoI for the entire network is obtained and the AoI is optimized by power scheduling parameters [29].

3. System Model and Optimization Model

3.1. System Model

Figure 2 illustrates the topology of the IoMT which is born out of the IoT and wearable devices. Therefore, the core of the IoMT are the users equipped with several wearable devices involving wireless sensors. These wearable devices on user’s body can detect the physiological information (such as the blood pressure, the pulse, the temperature, and the electrocardiogram (ECG), etc.) and mobility information (such as location, move speed and move direction, etc.). In addition, there is a coordinator on user’s body used to collect the information from all the wearable devices on the same body and communicate with the gateway. The physiological information of all users is sent to the gateway and then transmitted to the nurse, doctor or ambulance on demand through the Internet. In this paper, each user selects a channel from a gateway in each time slot. In order to describe the problem more conveniently, we first illustrate the notations.

The AoI of each mobile node is defined as the elapsed time when the latest data of this node is received by the gateway, as shown in Eq. (1).

is the generation time of the currently received data frames, is the length of each time slot. Here, we represent the AoI by using the specific time other than the time slot, which is more precise. At each time slot, the system pays the cost for AoI, and the cost is defined as a function of the AoI of all mobile nodes. Since is the cost paid by the system for lack of fresh information from the source node, it is a non-descending function, as shown in Eq. (2).

Where is defined as the cost function of the AoI of node j,

is the weight coefficient, it is determined by the ratio of the consumed energy of node j to the initial energy, Among them, is the energy consumed by the node, is the energy consumption of free space transmission.

The mobile node communication complies with the 802.11 standards and adopts OFDM technology. The signal-to-noise ratio of the mobile node is defined as follows:

3.2. Optimization Model

s.t.

The formula (7) indicates that in any time slot t, one channel k can only be allocated to one node j. The formula (8) indicates that in the time slot t, a node can only communicate with one gateway. The formula (9) indicates whether the channel k of the gateway i is allocated to the user j in the time slot t, 1 means yes, and 0 means no. Equation Formula (10) indicates that the number of occupied gateways cannot exceed the number of available gateways. Equation Formula (11) indicates that the number of occupied channels cannot exceed the number of available channels. Equation Formula (12) indicates that the occupied channel bandwidth cannot exceed the total channel bandwidth. Equation Equation (13) indicates that the signal-to-noise ratio of a node must be higher than the threshold so as to ensure the transmission rate.

For the network with small scale and small total number of channels, the enumeration method is available to calculate the cost of users choosing a subchannel of a gateway, and then find the subchannel with the lowest cost. However, if there are 1000 users, 5 gateways and 64 subchannel in the network, the amount of calculation of payment for AoI by enumeration method is at least 320000 times. Thus, for larger networks, the computational complexity is quite high. It is considerably significant to design a low-complexity algorithm to solve the proposed problem.

4. DQCA Algorithm Design

We assume that each user selecting the channel is a Markov decision process (MDP) and the policy decision and the AoI just depend on the selection in last time slot. In this network, there are a large number of users and they move randomly. The optimization model mentioned above is difficult to obtain an optimal analytical solution because the result of optimization depends largely on the built model and the computing process rate of the computer. Reinforcement learning is suitable for the channel allocation problem of the network. On the one hand, it can adjust actions through the interaction between the user and the environment and rewards, which can solve the optimization problem that is difficult to obtain analytical solutions; on the other hand, it can be well adapted to a highly dynamic environment and the frequently changing channel. Q-learning and DQN are two typical reinforcement learning algorithms. The algorithm flow diagrams are shown in Figures 3 and 4, respectively.

In Q-learning, the agent chooses an action under each state, builds a Q-table and record the Q-value for each pair of state and action. The Q-value is updated by the reward produced by the selected action. However, since all the possible states and actions are enumerated in Q-table, Q-learning is only suitable for the MDP problem with small state space and small action space. When the space becomes large, the storage space of the Q-table will become very large, and the Q-table cannot hold the memory. Meanwhile, the convergence speed of Q-learning will come down.

Compared with Q-learning algorithm, DQN uses the artificial neural network (ANN) to approximate the value function, uses target Q network to update the target value and use experience replay to train the learning process of reinforcement learning. DQN just updates the parameter of the artificial neural network rather than update the whole Q-table. Therefore, it shortens the convergence time and is more suitable for the problem with large state and action space. Considering a large number of users and channels, we abandon the Q-learning algorithm based on Q-table and choose the DQN to train the network to obtain an approximate optimal solution. Our proposed DQCA algorithm is a channel allocation algorithm based on DQN.

Agent: We define the controller node on mobile user as an agent. As an agent, it trains the neural network according to the network status (number of users, user location, moving speed and direction of users, etc.) to obtain reasonable actions.

System state: Denoted by s(t), including channel environment and node behavior. The behavior of the node mainly refers to the current position of the node (the mobility of node follows the random walk model [24]), and the nearest gateway is selected for access according to the position of the node. The channel environment can be characterized by the signal-to-noise ratio of the node. If node j selects gateway i for data transmission in time slot t, the signal-to-noise ratio of node i in time slot t is , if , then ; otherwise . That is, .

System action: After the node selects the gateway i, the system action is defined as which channel k of the gateway i is selected by the node j.

Reward: User j uses the immediate reward produced by at the system state , which is defined as Eq. (16). This revenue function can ensure that the cost of AoI is minimized while meeting the channel ratio constraint.

For each user j, we define the Q function as when take action at state , as shown by Eq. (17).

Where is the transition function from state to state . is a discount factor used to balance the immediate reward and long-term reward. is the set of feasible actions.

Q function and optimal policy: Then the optimal value of Q function and the optimal policy can be represented as Eq. (18) and Eq. (19), respectively.

Target value: To avoid overestimation brought by only one parameter in neural network, we use parameter and to illustrate the predict network and target network, respectively. Then the Q-function can be given by Eq. (20).

Loss function: To approximate the Q-function, we also define the loss function as Eq. (20) to train the weights and of ANN.

In DQCA, we first get the locations of all nodes and gateways and select a gateway for each node according to the shortest distance. And then we perform the channel allocation algorithm by Algorithm 1.

Input: Node list, gateway list
Initialization:
 1. Initialize cost and energy to 0.
 2. Initialize step to 1.
For episode =1 to maximum iteration time T do
 Count =1;
 Obtain state based on the input.
 1. Repeat:
  (1) Select action
  (2) Output the next state , reward , cost and energy according to the count and action
  (3) Store transition (, , , ) in the replay memory
 2. If step >200 and step % 5 == 0:
  Sample random minibatch of transitions (, , , ) from the replay memory pool.
Else:
  Continue
 3. Update:
   ;
   += ;
   +=;
  Step + =1;
  Count + =1;
  For each node in node list
   If packet size <= 0:
    Break
 4. Update target value .
 5. Perform a gradient descent step on .
 6. Reset for every Z steps.
End for

5. Simulation and Performance Evaluation

In this section, we first introduce the simulation setup, then show the simulation results and analyze the performance of the proposed algorithm.

5.1. Simulation Setup

To testify the effectiveness of our proposed algorithm, the Q-learning algorithm and greedy algorithm are also simulated with the DQCA algorithm for comparison. Q-learning algorithm builds Q-table for each node and finds the maximum Q-value for each node from all available actions. The main idea of the greedy algorithm is to allocate the channel in each time slot with the minimum growth of the cost function in the next slot as the optimization objective [24]. To prove the effectiveness of the proposed algorithm, this paper compares the three algorithms from three aspects: cost, AoI, and energy consumption. Among them, cost refers to the overall cost of the network, calculated according to formula (6), and the average AoI of all nodes is calculated as follows:

Energy consumption is the average energy consumption of all nodes, defined as

5.2. Performance Evaluation

To verify the effectiveness and feasibility of the proposed DQCA algorithm, this paper uses three different scenarios. The first one: the average size of data packet is 5 M, and the data packet arrival interval is 50 ms, the number of nodes in the network changes; the second one: the number of nodes is 20, the data packet arrival interval is 50 ms, and the average size of data packet changes; the third one: the number of nodes is 20, the average size of data packet is 5 M, and the data packet arrival interval changes. The simulation program runs on a computer with an Intel Core i7-3520M with 2.90GHz frequency CPU and 8G RAM. The parameters used in the simulation are shown in Table 1.

Figures 57 study the impact of changes in the number of nodes on network performance when the length of the data packet and time slot is fixed as defined in the first scenario. It can be seen from Figures 5 and 6 that the average AoI and average energy consumption of the three algorithms continuously reduce as the number of nodes increases. This is because the AoI and energy consumption increase more slowly than the number of nodes, resulting in a decrease in the average value. At the same time, due to the large state space, Q-learning needs to consume more time and computing resources, and it is necessarily inferior to DQCA in terms of AoI and energy consumption. Especially for energy consumption, compared with the Q-learning algorithm, the average energy consumption of all nodes of the DQCA algorithm is reduced by about 38.56%. It can be seen from Figure 7 that the costs of the three algorithms all increase with the increase of the number of nodes, among which the DQCA algorithm increases slowly and the increment is small. The cost takes into account the AoI and energy consumption of the nodes. The DQCA algorithm has more advantages in these two aspects than the other two algorithms. Therefore, the total cost is significantly lower than the Greedy and Q-learning algorithms and can be reduced by up to 57.3% compared to the Greedy algorithm.

Figures 810 shows the fixed number of nodes and packet arrival interval in the second scenario, to study the change of network performance with the size of data packet. It can be seen that as the size of the data packet continues to increase, the AoI and energy consumption of the node is also increasing, so the cost increases accordingly. This is because after the data packet increases, the processing and transmission time of the data packet increases, and it takes longer time for the gateway node to wait for the latest update of the node, and the energy consumption of the transmitter and receiver of the nodes will increase accordingly. Compared with the greedy algorithm and Q-learning algorithm, the DQCA algorithm can reduce the cost by about 62% and 60%.

Figures 1113 show the fixed number of nodes and the size of data packet in the third scenario, to study the change of network performance with the packet arrival interval. It can be seen that when the data packet arrival interval increases, the number of data packets in the network decreases, and the data packets sent and received by the node decrease, so the energy consumption of the node is reduced. Due to the increase of the data packet arrival interval, the probability of the node being allocated to the channel at the gateway node also increases, that is, the waiting time for an assigned channel for the node is shortened. As can be seen from Figure 11, overall, the AoI of the node is reduced. From the simulation results in Figure 13, as the packet arrival interval continues to increase, the impact on the average energy consumption and cost of the node gradually decreases, and the curves in Figures 12 and 13 tend to be stable. This is because the packet arrival interval increases to a certain extent, the basic energy consumption of the node accounts for a larger proportion of the total energy consumption, and the energy consumption of the node is less affected by the sending and receiving of data packets.

The greedy algorithm only considers the optimal value of the current function, does not consider the previous choice, nor the consequences of the current choice. But in fact, this method often does not have the best results. Therefore, in Figures 513, the greedy algorithm has the worst performance compared to Q-learning and DQCA.

6. Conclusion

Focused on the freshness of information in IoMT, this paper studied the channel allocation problem oriented to the AoI. In this paper, system cost is defined as a non-descending function about the AoI and energy consumption of nodes. Since the system cost optimization problem is difficult to solve due to the large amount of users and the mobility of users, we adopt a DQN-based method named DQCA algorithm. The simulation compared the proposed DQCA algorithm with Greedy algorithm and Q-learning algorithm in three different cases. The simulation results show the superiority of DQCA algorithm from the aspects of average AoI and average energy consumption of nodes and system cost.

Notations

N:the total number of gateway nodes
M:the total number of nodes
K:the total number of sub-channels
:the gateway index,
j:the node index,
k:the channel index,
:are the set of channels, the set of gateways, and the set of nodes, respectively
q:the time slot sequence number
:the time when the data frame is received in the time slot
:the information age of the node j communicating with the gateway i in the time slot q
:the length of the frame sent
:the basic energy consumption of the transmitter
:energy consumption parameter for free space transmission
:the distance between mobile node j and gateway i
:the signal-to-noise ratio of the mobile node
:is the transmission power of node j to gateway i
:the channel gain
:Gaussian noise
:a number can be set on demand.

Data Availability

The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Conflicts of Interest

The author(s) declare(s) that they have no conflicts of interest.

Acknowledgments

National Natural Science Foundation of China, Grant/Award Number: 61501308; Basic research project of Liaoning Provincial Department of Education, Grant/Award Number: LG202027; Postdoctoral Research Station project of Shenyang Ligong University.