For each agent and episode do | | IF is the critical section | Initialise , | ELSE | Initialise , , | For each control step do (Loop I) | (i) get detected data from each cell : , , , , | (ii) get state through (8) and (9) | (iii) get action by -greedy policy (10) | (iv) get , through (6) and do , | IF is the critical section | update , through (2) and (12) | ELSE update , , through (4) and (13) | IF and end the algorithm | ELSE get , , by (7) and do , , , and start loop II | For each planning step do (Loop II) | (i) generate flow rates for each cell : through (5) | (ii) get the state | (iii) get , and do , | (iv) get action by -greedy policy | IF is the critical section | update , | ELSE update , , | IF or ( and ) go back to loop I | ELSE repeat loop II | EndFor | EndFor | EndFor |
|