Research Article
Genetic Scheduling and Reinforcement Learning in Multirobot Systems for Intelligent Warehouses
Algorithm 2
-learning for path planning.
(1) Initialize the total episodes of learning Maxepi, maximum step number Maxstep, discount rate ; | (2) Initialize the coordinate of the starting and ending point; | (3) Initialize based on the actual grid world; | (4) for to Maxepi do | (5) Initialize the learning rate ; | (6) for to Maxstep do | (7) Decrease the learning rate gradually; | (8) Choose from by greedy action selection; | (9) Take action , observe , ; | (10) | (11) | (12) if is a goal state then | (13) Break; | (14) end if | (15) end for | (16) end for |
|