Research Article

Anti-Attack Scheme for Edge Devices Based on Deep Reinforcement Learning

Algorithm 2

Optimal strategy.
01: Initialize replay memory ;
02: Initialize anticipatory parameters ;
03: Initialize target function with weight ;
04: for, do
05: Set policy ;
06: Receive initial observation state and reward ;
07: for, do
08:  Select action at from policy ;
09:  Execute action at and observe reward and observe new state ;
10:  Store transition () in ;
11:  Sample random minibatch of transition () from
12:  if terminates at step then
13:   ;
14:  else
15:   ;
16:  end if
17:  Perform a gradient descent step on with respect to network parameters;
19:  Periodically update the target networks ;
20: end for
21: end for