Research Article
Rebalancing Docked Bicycle Sharing System with Approximate Dynamic Programming and Reinforcement Learning
| Initialize state , parameters , and learning rates . | | | | while do | | for do | | Choose an action and observe following state and reward . | | . | | . | | . | | Update , . | | |
|