Research Article
Optimizing the Pairs-Trading Strategy Using Deep Reinforcement Learning with Trading and Stop-Loss Boundaries
Algorithm 1
Optimized pairs-trading system using DQN.
Initialize replay memory and batch size | |
Initialize deep Q-network | |
Select pairs using cointegration test | |
(1) For each epoch do | |
(2) Profit = 1.0 | |
(3) For steps t = 1, … until end of training data set do | |
(4) Calculate spreads using OLS or TLS methods | |
(5) Obtain initial state by converting spread to Z-score based on formation window | |
(6) Using epsilon-greedy method, select a random action | |
(7) Otherwise select | |
(8) Execute traditional pairs-trading strategy based on the action selected | |
(9) Obtain reward by performing the pairs-trading strategy | |
(10) Set next state | |
(11) Store transition in | |
(12) Sample minibatch of transition from . | |
(13) | |
(14) Update Q-network by performing a gradient descent step on | |
(15) End | |
(16) End |