Reinforcement Learning-Based Service-Oriented Dynamic Multipath Routing in SDN

<div>Performance of three path distribution schemes in the realistic six-service scenario (the average values are calculated from the 500th to the 1500th second after convergence). (a) Reward gained by SPD; (b) reward gained by DQN; (c) reward gained by RED-STAR; (d) average reward of three services of each scheme; (e) maximum bandwidth utilization of each scheme.</div>

Wireless Communications and Mobile Computing

fig16

Figure 16

Figure 16: Reinforcement Learning-Based Service-Oriented Dynamic Multipath Routing in SDN