Skip to content

Latest commit

 

History

History
64 lines (49 loc) · 4.34 KB

readme.md

File metadata and controls

64 lines (49 loc) · 4.34 KB

RL3 FinRL Task 1 Submission

In this repository, we present the submission of the RL3 team to the FinRL2024 Competition

Our submission presents an Online Adaptation method based on OAMP [1], which selects online the best expert to apply out of a pretrained pool of experts.

To train the experts of the ensemble, we split the training dataset into 10 subsets, where each subset corresponds to a calendar day. The first two days are kept for validating the OAMP algorithm. The next 7 days are used as training subsets. For each training subset, we train 3 different RL agents on the whole day, one DQN [2] agent, one PPO [3] agent and one FQI [4] agent. For DQN and PPO we employ the implementations found in stable_baselines3 [5], whereas for FQI we employ our own implementation. We evaluate each algorithm on the next day to perform hyperparameter tuning. In total, we train 21 agents, 3 for each day from 9th to 15th April. Out of these 21 agents, we choose 7 final agents to be employed in the ensemble, choosing the best out of DQN, PPO or FQI based on their respective validation scores for that day. We opt to use the first two days for validating the ensemble since the last two show a market trend reversal, this way we have a diverse set of experts that have seen diverse market regimes. The last day is kept as a final test of the Ensemble.

For training and evaluating DQN and PPO agents, we employ an episodic setting, where in each episode we pick a random starting state in the training day and run our agents for 480 steps of 2 seconds. Therefore, our agents optimize the expected return of trading intervals of 16 minutes. We chose these values out of some preliminary testing. For training FQI, we employ the same episodic setting. FQI needs an offline dataset of interactions as input. For each training day, we generate a dataset composed of 1000 episodes of 4 baselines polices:

  • Random Agent
  • Short Only Agent
  • Long Only Agent
  • Flat Only Agent

The repository is organized as follows:

├── data # Folder containing the input datasets to trade_simulator
├── agents # Folder containing the pretrained models used in the Ensemble
├── agent # Folder containing the agent classes, used to train and load the agents
│ ├── base.py # Base agent interface
│ ├── baselines.py # Baseline agents used to generate FQI datasets
│ ├── factory.py # Agent factory used for loading and utilities
│ ├── fqi.py # Implementation of the FQI agents
│ ├── online_rl.py # Implementation of the DQN and PPO agents
├── trade_simulator.py # A slightly modified version of the simulator needed for OAMP
├── task_1_ensemble.py # Script for training, model selection and saving the Ensemble agents
├── task_1_eval.py: # Script for running the evaluation of the Ensemble with OAMP
├── requirements.txt: # Requirements needed to perform training and validation
├── readme.md: # this readme file

To execute our ensemble agent, it is enough to run task_1_eval.py, as we have already added all the configurations needed to load the agents of the ensemble and run OAMP. You only need to specify the updated BTC_1sec.csv and BTC_1sec_predict.npy relative to the evaluation period.

[1] Antonio Riva, Lorenzo Bisi, Pierre Liotet, Luca Sabbioni, Edoardo Vittori, Marco Pinciroli, Michele Trapletti, and Marcello Restelli. Addressing non-stationarity in fx trading with online model selection of offline rl experts. In Proceedings of the Third ACM International Conference on AI in Finance, ICAIF ’22,

[2] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.Human-level control through deep reinforcement learning. Nature,

[3] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.

[4] Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. J. Mach. Learn. Res., 6

[5] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research,