Research Article

DQfD-AIPT: An Intelligent Penetration Testing Framework Incorporating Expert Demonstration Data

Algorithm 1

Implements PER with the sum tree structure.
Input: : batch size, : capacity of the sum tree, and : the amount of transition currently stored by the sum tree, initialised the sum tree structure
Output: Updated the sum tree structure after sampling the transitions
(1)ifthen
(2)  push the transition into the sum tree with maximal priority
(3)end if
(4) Start sampling batch-size transitions from the sum tree
(5) Calculate
(6)for steps do
(7)  Sample transition data with priority from the sum tree
(8)  , ,
(9)   generate a random number between and
(10)  Transition and corresponding priority are obtained according to a random number
(11)  Compute importance sampling weight for each transition
(12)end for
(13) Train this batch size of transition and compute the TD error according to the weight
(14) Update transition priority according to the TD error