Research Article

Genetic Scheduling and Reinforcement Learning in Multirobot Systems for Intelligent Warehouses

Algorithm 2

-learning for path planning.
(1)Initialize the total episodes of learning Maxepi, maximum step number Maxstep, discount rate ;
(2)Initialize the coordinate of the starting and ending point;
(3)Initialize based on the actual grid world;
(4)for to Maxepi do
(5)  Initialize the learning rate ;
(6)  for to Maxstep do
(7)   Decrease the learning rate gradually;
(8)   Choose from by greedy action selection;
(9)   Take action , observe , ;
(10)  
(11)     
(12)     if is a goal state then
(13)    Break;
(14)     end if
(15) end for
(16) end for