Skip to content

Releases: LovelyBuggies/MFG-RL-PIDL

Version 2.0.0

01 Nov 02:04
Compare
Choose a tag to compare
  • Source code for paper - A Hybrid Framework of Reinforcement Learning and Physics-Informed Deep Learning for Spatiotemporal Mean Field Games.
  • Pure PIDL and RL+PIDL algorithm.

Version 1.0.0

17 Oct 02:02
Compare
Choose a tag to compare

Make RL(DDPG) + PIDL work, now we are able to train three networks together for three rewards.

Contributions:

  1. Using DDPG, the actor can output continuous speed.
  2. On board with PIDL.
  3. Using fictitious play to calculate the speed.
  4. If not for plotting, no array is involved (all are networks).
  5. Allow options like starting with a supervised critic, turning PIDL training off, smooth plotting, etc.
  6. Provide notebooks for pure PIDL and the critic.

Vulnerabilities: The separable reward depends on the actor's initial weight, sometimes cannot get the expected results.

0.3.1

26 Sep 02:52
Compare
Choose a tag to compare

Replace get_rho_from_u and one step supervised training to get_rho_network_from_u.

Notes:

  1. Initialization is still using supervised learning.
  2. rho array is not used in the outer loop, just for plotting.

Version 0.3.0

22 Sep 22:56
Compare
Choose a tag to compare

Make non-sep work with RL, training A, C, and rho_net together. But sep has some problems.

Alg.
Every tau step, train critic.
Train actor.
Train rho net.

Version 0.2.2

15 Sep 17:08
Compare
Choose a tag to compare

Unlike V 0.2.1, this version replaces actor and critic with u and V but updates rho by one step in each iteration.

Notes:

  1. Change MFG_VI.py line 10, value_iteration_ddpg.py line 90-92, 95-100, to switch LWR, Non-SEP, SEP.
  2. Supervised rho network using x, t.
  3. LWR needs a learning rate of 0.1, the other two need 0.001.
  4. V 0.2.1 Non-SEP failure could be because of learning rate.

Version 0.2.1

14 Sep 23:49
Compare
Choose a tag to compare

Like Tag - iterative rl, train 3 networks together or iterative RL both work for LWR and we can get results from the inaccurate rho (why ..1 comes from), but not for the non-separable case.

Version 0.2.0

14 Sep 23:43
Compare
Choose a tag to compare

For the ring road, like Tag - make-lwr-work, it updates the actor and critic synchronously (compared with V 0.1.0) based on an initially accurate rho.

Version 0.1.0

15 Aug 19:13
Compare
Choose a tag to compare

Deliveries

  • Make DDPG work on a ring road.
  • Actor uses a continuous critic (fake) rather than interpolation, then the critic uses the actor and the fake to learn.

Notes

  • No fictitious play for actors yet, only for the u array.
  • Routing.

Version 0.1.0 alpha

15 Sep 00:05
Compare
Choose a tag to compare

This version is a preparation for V 0.1.0, using single link road and use discretized actions with param n_actions.