Research Article
Anti-Attack Scheme for Edge Devices Based on Deep Reinforcement Learning
01: Initialize replay memory ; | 02: Initialize anticipatory parameters ; | 03: Initialize target function with weight ; | 04: for, do | 05: Set policy ; | 06: Receive initial observation state and reward ; | 07: for, do | 08: Select action at from policy ; | 09: Execute action at and observe reward and observe new state ; | 10: Store transition () in ; | 11: Sample random minibatch of transition () from | 12: if terminates at step then | 13: ; | 14: else | 15: ; | 16: end if | 17: Perform a gradient descent step on with respect to network parameters; | 19: Periodically update the target networks ; | 20: end for | 21: end for |
|