Research Article

Optimizing the Pairs-Trading Strategy Using Deep Reinforcement Learning with Trading and Stop-Loss Boundaries

Algorithm 1

Optimized pairs-trading system using DQN.

Initialize replay memory and batch size
Initialize deep Q-network
Select pairs using cointegration test
(1) For  each epoch  do
(2) Profit = 1.0
(3) For  steps t = 1, … until end of training data set  do
(4) Calculate spreads using OLS or TLS methods
(5) Obtain initial state by converting spread to Z-score based on formation window
(6) Using epsilon-greedy method, select a random action
(7) Otherwise select
(8) Execute traditional pairs-trading strategy based on the action selected
(9) Obtain reward by performing the pairs-trading strategy
(10) Set next state
(11) Store transition in
(12) Sample minibatch of transition from .
(13)
(14) Update Q-network by performing a gradient descent step on
(15) End
(16) End