Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning

<div><b>The average cumulative reward curves.</b> Each point is the average cumulative reward achieved per hundred episodes. The <svg height="9.39034pt" id="M40" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -5.96091 7.65486 9.39034" width="7.65486pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M556 393C556 426 537 448 514 448C496 448 478 435 471 420C468 414 462 401 466 394C473 382 480 368 480 346C480 268 392 143 338 67H336C329 163 319 253 309 330S286 448 254 448C215 448 160 383 127 337L143 311C167 344 200 373 208 373S222 365 229 320C246 214 264 66 268 -20C219 -83 131 -171 17 -239L25 -261L137 -235C248 -110 273 -79 335 8C384 77 481 215 520 287C544 332 556 363 556 393Z" id="g113-122"></path></g></svg>-axis denotes the average cumulative reward and<i> x</i>-axis denotes iteration epoch.</div>

Journal of Robotics

fig5

Figure 5

Figure 5: Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning