Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning

<div><b>Training curves of the loss function of Q target network.</b> Each point is the average loss function value achieved per ten epochs. The <svg height="9.39034pt" id="M38" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -5.96091 7.65486 9.39034" width="7.65486pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M556 393C556 426 537 448 514 448C496 448 478 435 471 420C468 414 462 401 466 394C473 382 480 368 480 346C480 268 392 143 338 67H336C329 163 319 253 309 330S286 448 254 448C215 448 160 383 127 337L143 311C167 344 200 373 208 373S222 365 229 320C246 214 264 66 268 -20C219 -83 131 -171 17 -239L25 -261L137 -235C248 -110 273 -79 335 8C384 77 481 215 520 287C544 332 556 363 556 393Z" id="g113-122"></path></g></svg>-axis denotes the value of loss function and <svg height="6.1673pt" id="M39" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 7.39387 6.1673" width="7.39387pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M536 404C536 423 520 448 491 448C445 448 398 404 308 283L286 338C255 416 242 448 217 448C182 448 138 402 92 341L111 321C149 368 169 378 178 378C188 378 198 363 214 320L254 212C185 117 138 65 107 65C98 65 85 69 82 75C79 82 73 86 65 86C44 86 23 60 23 39C23 7 44 -12 71 -12C119 -12 168 33 265 177L306 61C321 17 347 -12 373 -12C413 -12 465 33 507 96L491 119C459 84 432 60 413 60C395 60 378 92 358 148L321 250C341 279 369 310 389 332C417 363 439 382 456 382C466 382 475 376 481 368C486 361 492 358 496 358C513 358 536 381 536 404Z" id="g113-121"></path></g></svg>-axis denotes iteration epoch.</div>

Journal of Robotics

fig4

Figure 4

Figure 4: Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning