Research Article

Rebalancing Docked Bicycle Sharing System with Approximate Dynamic Programming and Reinforcement Learning

Table 1

The nomenclatures used in this study.

Sets
Set of stations (0: depot)
Set of time steps
Set of states
Set of feasible actions
Set of policies
Sequence of decision points

Indices
Decision point
Decision state
Point in time in state

Parameters
Cargo vehicle capacity
Travel time between two stations
Service time for rebalancing per bicycle
Station capacity
Safety buffer
z-score for the safety stock
Station observed pickup demand at time
Station observed return demand at time
Station predicted pickup demand at time
Station predicted return demand at time

Variables
Cargo vehicle load at time
Cargo vehicle location at time
The number of delivered bikes from the cargo vehicle at time
Station fill levels at time
Station fill rate index in time
Delivery decision
Next station decision