Comparision between PPO, SAC and TD3 #83

SuhanNShetty · 2023-06-22T12:17:54Z

SuhanNShetty
Jun 22, 2023

Hi,

First of all, thanks for what looks like a fantastic library for RL. I am looking forward to testing it.

I have a question related to comparing the performance of SAC and TD3 with PPO.

In the examples section (https://skrl.readthedocs.io/en/latest/intro/examples.html), for Isaac Gym environments, all the examples for different RL environments only used PPO as the RL algorithm and use SequentialTrainer. I am curious to know why only PPO is being used there. Why not SAC or TD3? Is there a comparison of how other state-of-art algorithms such as SAC or TD3 work for different complicated tasks in your implementation? Do these off-policy algorithms work (w.r.t. reward and wall-clock time) as well as PPO in your implementation?

One of the main reasons I was interested in using this library is that I could use different RL algorithms (other than PPO) with NVIDIA Isaac Gym environments (unlike other libraries such as rl_games). So I am looking forward to your answer to the above question.

Thanks,
Suhan

Toni-SM · 2023-06-23T21:10:00Z

Toni-SM
Jun 23, 2023
Maintainer

Hi @SuhanNShetty

In the following reedit post you can find some answers: Isaac Gym with Off-policy Algorithms

In the ddpg_td3_sac.zip file you can find the code for training DDPG, TD3 and SAC in the NVIDIA Omniverse Isaac Gym Ant environment.

Note that compared with PPO (that is trained in 4096 parallel environments), DDPG, TD3 and SAC are configured to be trained in only 64 environments. You need to call the scripts as follow:

PATH_TO_ISAAC_SIM/python.sh ant_ddpg.py headless=True num_envs=64
PATH_TO_ISAAC_SIM/python.sh ant_td3.py headless=True num_envs=64
PATH_TO_ISAAC_SIM/python.sh ant_sac.py headless=True num_envs=64

Regarding the execution time, PPO training (in 4096 parallel environments) runs about 16 times faster than DDPG/TD3/SAC training (in 64 parallel environments).

You should see results similar to the following:

2 replies

SuhanNShetty Jun 25, 2023
Author

Hi @Toni-SM , Thanks for the quick response and the code you shared. Just curious, is there any reason why "64" was chosen as the number of environments for SAC/TD3?

I understand that PPO (on-policy algorithm) can exploit GPU-based simulators like NVIDIA Isaac Gym much better than the off-policy algorithms in terms of wall-clock time. However, what I am curious to know is, in your implementation, if the policy from PPO yields better performance over off-policy upon convergence. It would have been great if you had plotted curves for PPO along with the other methods in the above plot. Will there be any benefits of using SAC or TD3 with NVIDIA Isaac Gym then?

Toni-SM Jun 26, 2023
Maintainer

Hi @SuhanNShetty

I just used the same hyperparameters and the same environment settings used in OmniIsaacGymEnvs:

AntSAC.yaml for configuring agents
Number of environment:
https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/blob/220d34c6b68d3f7518c4aa008ae009d13cc60c03/omniisaacgymenvs/cfg/task/AntSAC.yaml#L7-L8

You can find plotted curves for PPO here: #32 (comment)

According to the results, I don't see any benefit of using off-policy as it (lower number of environments and slow training times compared with PPO)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparision between PPO, SAC and TD3 #83

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Comparision between PPO, SAC and TD3 #83

SuhanNShetty Jun 22, 2023

Replies: 1 comment · 2 replies

Toni-SM Jun 23, 2023 Maintainer

SuhanNShetty Jun 25, 2023 Author

Toni-SM Jun 26, 2023 Maintainer

SuhanNShetty
Jun 22, 2023

Replies: 1 comment 2 replies

Toni-SM
Jun 23, 2023
Maintainer

SuhanNShetty Jun 25, 2023
Author

Toni-SM Jun 26, 2023
Maintainer