Abstract

Mobile edge computing technology has emerged as a novel computing paradigm that makes use of resources close to the devices of the smart rail system. Nevertheless, it is difficult to support data offloading to the stations directly from different trains due to the limited coverage of the stations equipped with MEC servers. Therefore, multi-hop ad hoc network is considered and introduced in this case. In this paper, an improved architecture is proposed for the MEC-based smart rail system by blockchain and multi-hop data communication. The requesting trains can offload the tasks to MEC servers by multi-hop transmission between trains, even when requesting trains are not covered by servers. Furthermore, we utilize the blockchain technology for the authenticity and anti-falsification of information during multi-hop transmission. Then, the offloading routing path and offloading strategy are co-optimized to minimize both delay and cost of the system. The proposed majorization problem is formulated as a Markov decision process (MDP) and solved by deep reinforcement learning (DRL). In comparison to other existing schemes, simulation results demonstrate that the proposed scheme can greatly improve system performance.

1. Introduction

As the smart rail system continues to grow, it is urgent to realize the dynamic aggregation, deep mining, and effective utilization of various application data by building high-performance ubiquitous computing power. Cloud computing was employed to resolve the issue, which is because of the constrained processing power of the trains [1]. However, it is obvious that cloud computing architecture cannot meet the real-time requirements for information processing in the smart rail system, on account of the rapid mobility of trains [2, 3]. Fortunately, mobile edge computing (MEC), as an emerging technology, solves the issue mentioned above effectively. Meanwhile, the MEC technology performs computing tasks on edge servers close to the device rather than on the cloud, which meets the sensitive delay requirements. At the same time, it brings high-quality services to users [4, 5].

However, the coverage of the stations equipped with MEC servers is limited, so it is impractical to only consider that the trains are within the range of the MEC servers. Multi-hop ad hoc network has no fixed topology. Train nodes can spontaneously create wireless network for communication between trains to exchange information and data. Each train is not only a transceiver but also a router [6]. Therefore, we consider integrating the multi-hop ad hoc network and MEC technology. The requesting trains can offload the tasks to MEC servers by multi-hop transmission between trains, which enables the servers to be utilized in a wider range while meeting the low-latency requirement.

Although the combination of multi-hop ad hoc network and MEC in the smart rail system can bring great advantages, since a large amount of information related to traffic and driving is involved, how to effectively guarantee the security and reliability during data multi-hop transmission is worth considering. Fortunately, due to the distributed, immutable, and safety nature of blockchain, blockchain is applicable to prevent the information related to traffic and driving from being leaked or manipulated for the MEC-enabled smart rail system with multi-hop connection [711].

However, there are still significant obstacles to overcome before multi-hop ad hoc and blockchain can be effectively applied in the MEC-enabled smart rail system. For instance, how to properly select the routing path and the offloading decision with the high-speed movement of trains is deemed as the crucial problem. In addition, how to balance the delay and cost caused by the process of data delivery, offloading, and consensus in the MEC-enabled smart rail system also needs to be considered.

In this paper, to deal with the mentioned issues, we propose an improved optimization framework for the MEC-enabled smart rail system by multi-hop data communication and blockchain. Then, the offloading routing path, offloading strategy, and block size are co-optimized to minimize both delay and cost of the system during communication and computation process. Furthermore, by specifying the state space, action space, and reward function, a discrete Markov decision process (MDP) is formulated to characterize the dynamic jointly proposed problem. Additionally, we utilize dueling deep Q-learning network (DQN) for obtaining the optimal strategy.

The rest of this paper is structured as follows. Section 2 mainly proposes the system model. Then, we formulate the collaborative majorization problem in Section 3. In Section 4, the formulated problem is solved by dueling DQN algorithm. The experiment results are presented and discussed in Section 5. The last part summarizes the conclusion of this paper and the future directions.

2. System Model

In this section, we depict the system model, which consists of the network model, multi-hop routing path model, communication model, computation model, and blockchain model.

2.1. Network Model

In Figure 1, the architecture of a high-speed railway and a train station equipped with MEC servers which are managed by various suppliers is shown. The available computing resource and price of each MEC server are different in a real-time environment. We denote the set of these MEC servers as . There are several high-speed trains running on the tracks. We denote as the set of all trains. Multi-hop ad hoc network is utilized to assist requesting trains in computation task offloading. Trains can act as relaying nodes to realize information interaction with other trains by spontaneously creating wireless network.

There may be malicious relaying nodes when offloading computation tasks by relaying. Therefore, the trust-based blockchain system is utilized to ensure the authenticity and anti-falsification of data information during the relaying and offloading process. The last-hop relaying train sends the data consensus requirement and transaction information to the blockchain system for transaction verification after receiving the relaying task. Through the consensus mechanism, the requesting train node and the other relaying train nodes in the routing path check the information data. In the blockchain system, all trains are regarded as blockchain nodes. These nodes can play either a normal or a consensus node role. Normal trains are in charge of transferring and accepting ledger information, while consensus trains are in charge of creating new blocks and carrying out the consensus process. Each relaying train in the routing path is regarded as a candidate for consensus nodes, and we consider the trust value of each relaying train when voting for consensus.

In this work, the requesting trains can maximize the use of multi-hop ad hoc network to offload tasks to the MEC servers, even if the trains are not in the communication range of servers. A consensus process is initiated when the last relaying train receives the offloading task, and the security of information is guaranteed when all consensus nodes reach a consensus successfully. For each task, there are two important elements to be considered: latency and cost. In terms of latency, considering the link quality along the whole routing path and the processing capability of servers, the total expectable latency of one successful end-to-end transfer and calculation is evaluated. In terms of cost, the total expected cost is assessed, including the data relaying cost of each relaying train in the whole path and computing cost of different MEC servers. As a result, we can select the optimal routing path and offloading decision which has the minimal latency and cost.

2.2. Multi-Hop Routing Path Model

Firstly, to determine the performance of each pair of trains in multi-hop routing path, we depict a link model, which considers channel fading and mobility of trains. Then, based on the link quality obtained above, the routing metric about link correlation is utilized to select the optimal multi-hop routing path.

2.2.1. Link Quality

We utilize the Nakagami distribution model to represent the fading of radio wave propagation [12]. Thus, the successful delivery probability between sender train and receiver train in spite of channel fading can be obtained bywhere is the cumulative distribution function of the receiving signal power, is the reception threshold of a signal, and is the average signal strength. The fading parameter is related to distance between train and train at current time as follows:

Under a real urban environment, the train does not always run at a constant speed, and its speed changes with acceleration or deceleration. In this case, the movement of the trains can be abstracted by a Wiener process [13]. Assuming that the trains only have two directions toward each end, one direction is designated as positive. Therefore, the velocity variation of train during interval can be defined as follows:where and represent the velocity of train at time and . The drift parameter denotes the acceleration or deceleration of train , and the parameter follows the Gaussian distribution. Therefore, the relative distance variation of train and in period can be obtained asand is the distance between two trains at current time under different circumstances, which are presented as follows.Circumstance 1: two trains are moving in one direction:Circumstance 2: two trains are running in the opposite direction:where denotes the distance between two trains at time . Thus, we can predict link availability and obtain the probability of link availability on link aswhere is the communication range of trains. Therefore, according to equations (1) and (7), we can obtain the link quality of link , which represents the probability of successful transmission of the packets and is calculated by

2.2.2. Routing Path Quality

Once data packet loss occurs on one link, this packet should be retransmitted from the source to the destination, on account of the retransmission mechanism in the transport layer. As a result, there are various retransmission times and consumption of network resources for packets in different multi-hop paths. This phenomenon is referred to as a link correlation. Thus, the expected retransmission probability of the -hop path can be calculated bywhere is the link quality between the source train and the first relaying train and represents link quality between the relaying train and relaying train in the routing path.

Therefore, we define the expected times of data transfer in an -hop routing path as when one packet is successfully transferred from source to destination and present it as follows:where is the aggregation of the link quality of all links in an -hop routing path.

2.3. Communication Model

In this section, we describe the communication process of the system, including the representation of the communication latency and relaying cost.

2.3.1. Communication Delay

The communication delay is mainly caused by three parts, including the requesting train offloading the task to the last-hop relaying train through the multi-hop ad hoc network, the last-hop relaying train initiating a consensus after receiving the task, and the last-hop relaying train offloading the task to MEC server.

Firstly, the requesting train offloads the task to the last-hop relaying train through the multi-hop ad hoc network. The set of relaying trains in the routing path for the task is denoted as (except the requesting train) and . The data transmission rate per hop communication between relaying train and is obtained as follows:where represents the channel bandwidth, is the transmit power, indicates the path-loss exponent, and is the background noise power. Thus, the delivery delay of task in V2V -hop connections can be expressed aswhere denotes input data size required by task , is the transmission rate between the source train (i.e., requesting train) and the first relaying train, and is the transmission rate between the relaying trains and .

Secondly, the last-hop relaying train sends the data to the blockchain system for transaction verification after receiving the relaying task, so as to guarantee that the data are true without tampering. The delay generated by consensus process is defined as , which will be described in detail in the blockchain model.

Finally, the last-hop relaying train offloads the task to the MEC server managed by different suppliers through wireless communication. The Shannon–Hartley theory is used to estimate the uplink rate for data transmission from the last-hop relaying trains to MEC server via LTE cellular network, and it can be calculated aswhere represents the channel bandwidth, represents the background noise power, denotes the transmission power of the train (all trains have the same transmitting power), and is the channel gain between the train user and MEC server .

Therefore, the transmission delay caused in this process is calculated as

As mentioned above, we can obtain the total delay of communication process through

2.3.2. Communication Cost

We assume that each train has its own relaying price [14]. Corresponding to the relaying trains in the routing path for the task , the relaying price (per unit data volume) sequence is . Therefore, the total train relaying cost can be obtained by

2.4. Computation Model

Assume that each MEC server operated by different suppliers has its corresponding processing capacity and price for computing offloading task. The computing capacity and price (per unit task complexity) of MEC server are and , respectively. Then, the calculation delay and cost are presented asandwhere denotes the required CPU cycles for task .

Thus, the total delay and cost, including offloading delivery process and calculation process, are represented byand

2.5. Blockchain Model

In this paper, the delegated Byzantine fault tolerance (dBFT) consensus mechanism is adopted in our blockchain system to increase the efficiency of a consensus process without tampering [15]. Moreover, each relaying train in the routing path is regarded as a candidate for consensus nodes, and we consider the trust value of candidates to determine the nodes participating in the next round of consensus, which improves the throughput of blockchain, reduces the CPU cycles of transaction confirmation, and then effectively reduces the consensus latency [16]. The higher the trust value of the relaying node is, the more likely it is to be selected as a consensus node. The set of the selected consensus nodes is denoted by and . The dBFT consensus protocol can be adopted in the proposed system model to dynamically adapt to the change of the number of train nodes [13].

2.5.1. Calculation of Trust Value

Generally, the trust value is determined by its direct trust value and indirect trust value [17, 18]. The trust value of the train node is defined as and . Similar to [19], the threshold of trust value is set as 0.5. One node is trustworthy to be a candidate for consensus only if its trust value is higher than 0.5.

We utilize subjective logic to compute direct trust value of the blockchain nodes, which can be obtained as follows:where is the node honesty (NH) and represents the uncertainty during offloading due to unstable and noisy communication channels between the relaying trains [18]. is the remaining node capacity (NC) of the trains to complete task.

For the computation of indirect trust value, the number of times one node has been voted for consensus in the past is taken into account. The blockchain system regularly updates and records the selection of consensus nodes. Thus, the indirect trust value of one blockchain node can be defined aswhere is the total number of consensus processes and is the number of times the relaying train has been voted for consensus.

Therefore, the trust value of one candidate for consensus is represented bywhere and represent the weight of direct and indirect trust values, respectively. Meanwhile, and and .

2.5.2. Consensus Process

The specific dBFT consensus process is depicted in Figure 2. We assume that generating/validating one signature and message authentication code (MAC) requires and CPU cycles, respectively.

At first, the last-hop relaying train sends consensus requirement and transaction information to blockchain system upon receiving the offloading task. Then, the speaker of the consensus process in this round is assigned by blockchain. The assigned speaker packages the hash of the transactions as a prepare request message to launch a proposal and broadcast it to initiate a new consensus. During this phase, one signature and MACs are generated by the speaker. Thus, the computation cycles for the speaker node in this process are represented as

Secondly, each member collects all the transactions information of the prepare request message. If the transactions are verified successfully, the members add the transactions to the consensus module and broadcast the prepare response messages to all consensus nodes. During this phase, members need to validate the signatures and MACs of the proposal and contained transactions and then generate one signature and MACs for forming prepare response messages. Therefore, the computation cycles for the member nodes are calculated bywhere is the total transaction batch size at time slot and represents the average size of transactions.

Then, if at least prepare response messages are received before the timeout, each consensus node verifies whether the messages are correct at first. Once the verification is successful, the commit messages are broadcast to other consensus nodes. During this phase, the consensus nodes verify signatures and MACs and then generate one signature and MACs for forming commit messages. Thus, for each consensus node, the consumed CPU cycles can be represented as

Finally, if the consensus nodes have collected more than commit messages and verified successfully, the consensus process is regarded as completed. At the same time, one block is produced and broadcast to blockchain system. During this phase, signatures and MACs should be verified by one consensus node. Thus, the computation cycles for each consensus node in this process are represented as

In terms of above analysis, the delay of consensus process is represented bywhere denotes the computing capacity of the speaker, represents the computing capacity of the consensus node , is the block generation interval, and is the broadcast delay between nodes.

3. Problem Formulation

In this section, it is necessary to jointly optimize routing path selection, offloading decision, and block size selection in a real-time environment so as to decrease delay and cost of the proposed network. The majorization issue is characterized as a MDP by identifying the state space, action space, and reward function.

3.1. State Space

During each time slot , we define the state space as a union of the link quality between each pair of all trains , relaying price of all trains , computing resource of MEC servers , and computing price of MEC servers , which is represented as

3.2. Action Space

The action space involves the routing path selection, the offloading decision, and block size selection. Formally, the action space is denoted aswhere is the set of relaying trains arranged in consecutive routing order in the multi-hop routing path. indicates the offloading decision, and represents that the task is executed on the MEC server managed by the first supplier, while indicates that the task is executed on the MEC server managed by the -th supplier. represents different level for block size and is the maximum block size.

3.3. Reward Function

We define the reward function to improve system performance and then devise immediate reward aswhere and are the weights of the latency and the cost, respectively, and . is the penalty value.

In this problem, indicates the time limitation for completely offloading tasks, where is the maximum tolerable delay. denotes the latency limitation of completing a block, where . The maximum size of the all transactions in a single consensus process is indicated by .

4. Problem Solution

In this paper, due to high dynamic characteristics of the proposed system, we adopt the dueling DQN algorithm to solve the proposed joint optimization issue. Dueling DQN is widely considered as a significant improvement to conventional DQN. Different from the natural DQN, dueling DQN divides the Q-network into two parts, action advantage function with independent of state and state-value function , which are calculated separately [20, 21]. It is easy to find which action has better feedback by learning . Finally, we can obtain the output of the dueling DQN network by merging two fully connected layers, which is denoted aswhere is the convolution layer parameter, represents the parameter of the specific connected layer of the state-value function, and denotes the parameter of the specific connected layer of the action advantage function. However, there is an unidentifiable problem in equation (32), which means that the respective roles of state-value function and action advantage function in the final Q value cannot be identified. To address that problem, dueling DQN sets expected value of the action advantage function to be zero at the selected action and implements the forward mapping of the last module of the network, which is written as

The separation of environmental state value and action advantage in dueling DQN solves the problem of repeated calculation of the same state value, enhancing the capability of estimating the environmental state with a clear optimization objective [22]. Therefore, we adopt dueling DQN in our proposed network to decrease computational complexity and training time.

Finally, the training process is formally described in Algorithm 1.

(1)Initialization:
 Initialize the experience memory and the mini-batch size ;
 Initialize evaluated network with the weight and bias set ;
 Initialize target network with the weight and bias set ;
 Initialize the greedy coefficient ;
(2)for do
(3)  Reset the state of trains and MEC servers with a random initial observation , and ;
(4)  fordo
(5)   Randomly choose a probability ;
(6)   ifthen
(7)    Randomly choose an action based on -greedy policy;
(8)   else
(9)    ;
(10)   end if
(11)   Execute action and obtain the reward , and proceed to the next observation ;
(12)   Store the experience , , , into experience replay memory;
(13)   Randomly sample a mini-batch of , , , from experience replay memory ;
(14)   Obtain two parts of evaluated network, including and , and merge them as through equation (33);
(15)   Obtain target Q value in target network by ;
(16)   Train evaluated network to minimize loss function by ;
(17)   Every several training steps, modify target Q-network according to evaluated Q-network;
(18)   ;
(19)  end for
(20)end for

5. Simulation Results and Discussion

In this part, we depict the effectiveness of the proposed scheme through simulation experiments. Firstly, the simulation environment and parameters are presented. Then, we analyze and discuss the results and the performance of the proposed framework.

5.1. Simulation Parameters

In the simulation experiments, we consider the network scenario with five trains running on the track, as well as two MEC servers managed by different suppliers. Furthermore, we summarize other significant simulation parameters in Table 1.

In order to assess how well the proposed framework performs, we consider five comparison schemes as follows. (1) The routing path is picked at random in the proposed method without path selection. (2) An approach without offloading selection: the MEC servers conduct the computing tasks at random. (3) Block sizes for created blocks are fixed in the proposed approach. (4) A technique based on natural DQN solves the problem as it is formulated. (5) PBFT-based scheme: all blockchain nodes participate in the consensus process.

5.2. Performance Comparison of Convergence

The convergence of the proposed optimization framework under various learning rates is shown in Figure 3. As can be seen in this figure, the learning rate (10−1) performs better than other schemes. It is because the large learning rate (10) might fall into local optimum and fail to obtain the globally optimal solution of the proposed problem. Moreover, the small learning rate (10−7) likely led to the slow convergence rate and took longer to find the optimal value. Hence, in this paper, the learning rate is selected carefully and set to be 10−1.

As shown in Figure 4, we examine the convergence performance under the different algorithms. It can be observed that dueling DQN reaches higher system reward and performs more stably than the scheme with natural DQN. The reason is that the chosen routing path, the selected MEC server, and the selected block size can hardly affect the changes of state in our scenario. Thus, our proposed scheme with dueling DQN has more advantage to the decision of the agent in this case.

Figure 5 depicts the comparison of system reward with training steps under our proposed dBFT-based scheme and PBFT-based scheme. We can see that dBFT-based scheme gets higher total reward. It is because all nodes need to participate in the consensus process under the PBFT-based scheme. On the contrary, there is only a part of the trusted nodes participating for consensus under the dBFT-based scheme. The dBFT algorithm reduces the computation cycles and improves the efficiency of the consensus process. Therefore, our proposed dBFT-based scheme is more suitable for the smart rail system with high-speed mobility.

5.3. Performance Comparison of Different Aspects

Figure 6 presents the relationship between total latency and task data size under different schemes. One observation is that total latency under all schemes increases with the increase of task data size. The reason is that the increase in task size takes longer for end-to-end delivery and offloading computation. Moreover, we can see from this figure that the total latency under our proposed scheme is consistently lower than that of others. It is because our proposed scheme simultaneously optimizes the routing path selection, offloading strategy, and block size selection. On the contrary, previous baselines just optimize a portion of these items.

Figure 7 illustrates the comparison of total cost with the size of the task data under various schemes. We can see that as the task data size increases, so does total cost of all schemes. Furthermore, our proposed scheme is superior to the schemes without routing path selection and offloading selection. Nevertheless, the scheme without fixed block size performs better than the proposed scheme. The reason is that the state of link quality fluctuates according to the high-speed movement of trains.

As shown in Figures 8 and 9, we examine the system weighted expense and the total latency under different block intervals. It can be observed that all the schemes gain a higher system weighted expense and total latency with the increase of the block intervals. The overall system latency and cost make up the system weighted expense. It is because the block generation interval rises, which makes the delay of blockchain higher. Additionally, system weighted expense and total latency of the previous baselines are visibly higher than those of our proposed scheme. Therefore, with joint consideration of the adaptive routing path selection, the optimal offloading decision, and the appropriate block size selection, our proposed scheme acts the best compared with other schemes.

6. Conclusions

In this paper, an improved optimization framework for the MEC-enabled smart rail system is proposed. In order to enable the MEC servers to be utilized in a wider range while meeting the low-latency requirement, multi-hop ad hoc network was applied to our proposed network model. Moreover, the blockchain technology based on the dBFT consensus mechanism was considered and introduced to effectively guarantee the security and reliability during multi-hop data transmission. Then, in order to reduce system latency and cost, the routing path selection, offloading strategy, and block size selection are co-optimized. We described the proposed dynamic majorization problem as a MDP and adopted dueling DQN to solve it. Simulation results demonstrated that the performance of the proposed scheme is better than existing baseline schemes. Furthermore, in the future, other routing mechanisms and cloud-edge collaborative architecture would be considered in our multi-hop ad hoc network for the smart rail system.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported in part by the National Natural Science Foundation of China under grant no. 61901011, in part by the Beijing Natural Science Foundation under grant nos. L211002, 4222002, and L202016, and in part by the Foundation of Beijing Municipal Commission of Education under grant nos. KM202110005021 and KM202010005017.