Research Article

Rebalancing Docked Bicycle Sharing System with Approximate Dynamic Programming and Reinforcement Learning

Algorithm 2

Actor-critic.
Initialize state , parameters , and learning rates .
while do
 for do
  Choose an action and observe following state and reward .
  .
  .
  .
  Update , .