Wireless Communications and Mobile Computing

Research Article

Anti-Attack Scheme for Edge Devices Based on Deep Reinforcement Learning

Optimal strategy.

01: Initialize replay memory ;
02: Initialize anticipatory parameters ;
03: Initialize target function with weight ;
04: for, do
05: Set policy ;
06: Receive initial observation state and reward ;
07: for, do
08: Select action at from policy ;
09: Execute action at and observe reward and observe new state ;
10: Store transition () in ;
11: Sample random minibatch of transition () from
12: if terminates at step then
13: ;
14: else
15: ;
16: end if
17: Perform a gradient descent step on with respect to network parameters;
19: Periodically update the target networks ;
20: end for
21: end for