Realized PnL Trading Environment

Let,

l(t_i) be the amount of long currency,
s(t_i) be the amount of short currency and
p(t_i) be the price of the currency

at time instant t_i. Following assumptions are made,

agent starts with 0 initial amount
due to short duration of episodes (maximum time range allowed is 10 minutes)
- agent can borrow any amount of money at any timestep at 0% interest rate with a promise to settle at the end of the episode
- future rewards are not discounted

When trading at time instant t_i , the agent is reward for its portfolio status between t_i and t_{i + 1} , since it is kept same in this entire duration.

Reward Function

it is the most intuitive and natural reward function to make an agent learn how to trade an asset
agent starts out with 0 initial amount and trades thorughout the epsiode for which it is given 0 reward at every timestep except for the last timestep, when it is rewarded with the net profit or loss that it has made over the entire epsiode
thus, the name realized PnL
it has been proved that this reward functions leads training to converge to optimal policy
however, the learning process is slow due to 0 intermediate rewards

Usage

import gym
import gym_cryptotrading
env = gym.make('RealizedPnLEnv-v0')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realized PnL Trading Environment

Realized PnL Trading Environment

Reward Function

Usage

Clone this wiki locally