Research Article
Rebalancing Docked Bicycle Sharing System with Approximate Dynamic Programming and Reinforcement Learning
Table 1
The nomenclatures used in this study.
| Sets | | | Set of stations (0: depot) | | Set of time steps | | Set of states | | Set of feasible actions | | Set of policies | | Sequence of decision points |
| Indices | | | Decision point | | Decision state | | Point in time in state |
| Parameters | | | Cargo vehicle capacity | | Travel time between two stations | | Service time for rebalancing per bicycle | | Station capacity | | Safety buffer | | z-score for the safety stock | | Station observed pickup demand at time | | Station observed return demand at time | | Station predicted pickup demand at time | | Station predicted return demand at time |
| Variables | | | Cargo vehicle load at time | | Cargo vehicle location at time | | The number of delivered bikes from the cargo vehicle at time | | Station fill levels at time | | Station fill rate index in time | | Delivery decision | | Next station decision |
|
|