A Sentence-Level Joint Relation Classification Model Based on Reinforcement Learning

<table class="algorithm-group"><tr><td><table class="algorithm" id="alg1"><tr><td> </td><td><b>Input</b>: Number of Episode N. Training data X, Initialize the RL model parameters <svg height="10.2124pt" id="M45" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -6.78297 8.63352 10.2124" width="8.63352pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M642 419L635 448C586 435 552 411 532 392C518 379 509 351 496 304C484 260 462 196 445 150C424 93 388 40 314 30L413 508L406 510L344 491L257 36C205 46 175 84 175 169C175 195 180 241 185 277C189 304 190 331 190 358C190 413 175 448 141 448C111 448 69 429 23 373L30 347C51 365 68 375 81 375C93 375 108 368 108 327C108 307 104 268 101 239C98 211 95 180 95 155C95 33 159 -8 248 -14L213 -186C206 -220 221 -254 230 -261L261 -230L306 -11C339 -4 366 6 389 20C431 46 473 87 503 155C533 224 557 286 571 325C586 367 606 393 642 419Z"></path></g></svg> and joint network model Parameters <svg height="9.49473pt" id="M46" style="vertical-align:-0.2063999pt" version="1.1" viewbox="-0.0498162 -9.28833 6.59789 9.49473" width="6.59789pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M475 507C475 612 440 712 326 712C139 712 23 420 23 215C23 96 58 -12 180 -12C369 -12 475 293 475 507ZM391 522C391 486 387 448 379 394H126C155 538 222 677 310 677C386 677 391 571 391 522ZM373 346C344 193 283 22 189 22C126 22 106 114 106 196C106 243 111 293 118 346H373Z"></path></g></svg></td></tr><tr><td> </td><td><b>Output</b>: RL model parameters <i>ψ</i> and joint network model Parameters <i>θ</i></td></tr><tr><td>(1)</td><td><b>for</b> episode <i>n</i> = 1 to N <b>do</b></td></tr><tr><td>(2)</td><td><b>foreach</b><svg height="11.927pt" id="M47" style="vertical-align:-3.291101pt" version="1.1" viewbox="-0.0498162 -8.6359 38.8783 11.927" width="38.8783pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M536 404C536 423 520 448 491 448C445 448 398 404 308 283L286 338C255 416 242 448 217 448C182 448 138 402 92 341L111 321C149 368 169 378 178 378C188 378 198 363 214 320L254 212C185 117 138 65 107 65C98 65 85 69 82 75C79 82 73 86 65 86C44 86 23 60 23 39C23 7 44 -12 71 -12C119 -12 168 33 265 177L306 61C321 17 347 -12 373 -12C413 -12 465 33 507 96L491 119C459 84 432 60 413 60C395 60 378 92 358 148L321 250C341 279 369 310 389 332C417 363 439 382 456 382C466 382 475 376 481 368C486 361 492 358 496 358C513 358 536 381 536 404Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,7.202,3.132)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z"></path></g><g transform="matrix(.013,0,0,-0.013,16.069,0)"><path d="M448 1V51H364C248 51 153 129 140 230H448V280H140C153 381 248 459 364 459H448V509H365C208 509 80 395 80 255S208 1 365 1H448Z"></path></g><g transform="matrix(.013,0,0,-0.013,26.564,0)"><path d="M748 650H522L515 622L546 617C580 611 587 604 565 575C518 513 469 451 419 393C376 474 349 534 330 580C318 609 325 612 361 618L383 622L392 650H151L144 622C214 616 224 612 257 543L360 327C270 218 187 124 159 95C106 40 92 34 26 28L17 0H252L259 28L236 31C189 37 188 47 209 78C249 136 308 210 377 294L478 79C494 44 487 37 449 32L418 28L409 0H673L680 28C596 34 591 39 554 116L436 361C526 469 574 521 604 553C659 612 669 614 739 622L748 650Z"></path></g></svg><b>do</b></td></tr><tr><td>(3)</td><td>Calculate the predicted score for each state</td></tr><tr><td>(4)</td><td>According to the predicted score, the action taken on the state is obtained</td></tr><tr><td>(5)</td><td>Calculate temporary and average Awards</td></tr><tr><td>(6)</td><td>Update the parameters <svg height="10.2124pt" id="M48" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -6.78297 8.63352 10.2124" width="8.63352pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M642 419L635 448C586 435 552 411 532 392C518 379 509 351 496 304C484 260 462 196 445 150C424 93 388 40 314 30L413 508L406 510L344 491L257 36C205 46 175 84 175 169C175 195 180 241 185 277C189 304 190 331 190 358C190 413 175 448 141 448C111 448 69 429 23 373L30 347C51 365 68 375 81 375C93 375 108 368 108 327C108 307 104 268 101 239C98 211 95 180 95 155C95 33 159 -8 248 -14L213 -186C206 -220 221 -254 230 -261L261 -230L306 -11C339 -4 366 6 389 20C431 46 473 87 503 155C533 224 557 286 571 325C586 367 606 393 642 419Z"></path></g></svg> of RL model</td></tr><tr><td>(7)</td><td>Calculate total award</td></tr><tr><td>(8)</td><td><b>end foreach</b></td></tr><tr><td>(9)</td><td>Train and update the parameters <i>θ</i> of joint network model</td></tr><tr><td>(10)</td><td>Update the parameters <svg height="10.2124pt" id="M49" style="vertical-align:-3.42943pt" version="1.1" viewbox="-0.0498162 -6.78297 8.63352 10.2124" width="8.63352pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M642 419L635 448C586 435 552 411 532 392C518 379 509 351 496 304C484 260 462 196 445 150C424 93 388 40 314 30L413 508L406 510L344 491L257 36C205 46 175 84 175 169C175 195 180 241 185 277C189 304 190 331 190 358C190 413 175 448 141 448C111 448 69 429 23 373L30 347C51 365 68 375 81 375C93 375 108 368 108 327C108 307 104 268 101 239C98 211 95 180 95 155C95 33 159 -8 248 -14L213 -186C206 -220 221 -254 230 -261L261 -230L306 -11C339 -4 366 6 389 20C431 46 473 87 503 155C533 224 557 286 571 325C586 367 606 393 642 419Z"></path></g></svg> of RL model</td></tr><tr><td>(11)</td><td>Find the best parameters for RL model according to the reward</td></tr><tr><td>(12)</td><td>Update the weights of the RL networks</td></tr><tr><td>(13)</td><td><b>end for</b></td></tr></table></td></tr></table>

<div> Joint training of the RL model and joint network model.</div>

Computational Intelligence and Neuroscience

alg1

Algorithm 1

Algorithm 1: A Sentence-Level Joint Relation Classification Model Based on Reinforcement Learning