Skip to content

Realized PnL Trading Environment

Kartikay Garg edited this page May 1, 2018 · 3 revisions

Realized PnL Trading Environment

Let,

  • l(ti) be the amount of long currency,

  • s(ti) be the amount of short currency and

  • p(ti) be the price of the currency

at time instant ti. Following assumptions are made,

  • agent starts with 0 initial amount

  • due to short duration of episodes (maximum time range allowed is 10 minutes)

    • agent can borrow any amount of money at any timestep at 0% interest rate with a promise to settle at the end of the episode

    • future rewards are not discounted

When trading at time instant ti , the agent is reward for its portfolio status between ti and ti + 1 , since it is kept same in this entire duration.

Reward Function

  • it is the most intuitive and natural reward function to make an agent learn how to trade an asset

  • agent starts out with 0 initial amount and trades thorughout the epsiode for which it is given 0 reward at every timestep except for the last timestep, when it is rewarded with the net profit or loss that it has made over the entire epsiode

  • thus, the name realized PnL

  • it has been proved that this reward functions leads training to converge to optimal policy

  • however, the learning process is slow due to 0 intermediate rewards

Usage

import gym
import gym_cryptotrading
env = gym.make('RealizedPnLEnv-v0')
Clone this wiki locally