Releases: LovelyBuggies/MFG-RL-PIDL
Version 2.0.0
- Source code for paper - A Hybrid Framework of Reinforcement Learning and Physics-Informed Deep Learning for Spatiotemporal Mean Field Games.
- Pure PIDL and RL+PIDL algorithm.
Version 1.0.0
Make RL(DDPG) + PIDL work, now we are able to train three networks together for three rewards.
Contributions:
- Using DDPG, the actor can output continuous speed.
- On board with PIDL.
- Using fictitious play to calculate the speed.
- If not for plotting, no array is involved (all are networks).
- Allow options like starting with a supervised critic, turning PIDL training off, smooth plotting, etc.
- Provide notebooks for pure PIDL and the critic.
Vulnerabilities: The separable reward depends on the actor's initial weight, sometimes cannot get the expected results.
0.3.1
Replace get_rho_from_u and one step supervised training to get_rho_network_from_u.
Notes:
- Initialization is still using supervised learning.
- rho array is not used in the outer loop, just for plotting.
Version 0.3.0
Make non-sep work with RL, training A, C, and rho_net together. But sep has some problems.
Alg.
Every tau step, train critic.
Train actor.
Train rho net.
Version 0.2.2
Unlike V 0.2.1, this version replaces actor and critic with u and V but updates rho by one step in each iteration.
Notes:
- Change MFG_VI.py line 10, value_iteration_ddpg.py line 90-92, 95-100, to switch LWR, Non-SEP, SEP.
- Supervised rho network using x, t.
- LWR needs a learning rate of 0.1, the other two need 0.001.
- V 0.2.1 Non-SEP failure could be because of learning rate.
Version 0.2.1
Like Tag - iterative rl, train 3 networks together or iterative RL both work for LWR and we can get results from the inaccurate rho (why ..1 comes from), but not for the non-separable case.
Version 0.2.0
For the ring road, like Tag - make-lwr-work, it updates the actor and critic synchronously (compared with V 0.1.0) based on an initially accurate rho.
Version 0.1.0
Deliveries
- Make DDPG work on a ring road.
- Actor uses a continuous critic (fake) rather than interpolation, then the critic uses the actor and the fake to learn.
Notes
- No fictitious play for actors yet, only for the u array.
- Routing.
Version 0.1.0 alpha
This version is a preparation for V 0.1.0, using single link road and use discretized actions with param n_actions
.