Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

glvov-bdai · 2024-10-25T09:45:30Z

Description

This PR adds Ray support, which enables a lot of really cool stuff by leveraging the existing Hydra support, including but not limited to:

Several training runs at once in parallel or consecutively with minimal interaction
Using the same training setup everywhere (on cloud and local) with minimal overhead
Tuning hyperparameters
Tuning hyperparameters in parallel on multiple GPUs and/or multiple GPU Nodes
Simultaneously tuning model hyperparameters for different environments/agents
Resource Isolation

I know this PR seems huge, but most of the code diff is config files / argparser stuff / documentation / comments

My main project at BDAI is changing from RL to LfD effective November 1st, so I'm posting this PR as early as possible so I have as much time as possible to address comments.

It would be much appreciated if the NVIDIA folks can work with me to get this reviewed ASAP. I realize that this is a pretty big PR; but I also think that it adds a lot of cool functionality, and the merging process will go much smoother if I am able to devote time to this while at work. Thanks! ;)

Fixes # (issue#1190), (issue#1213)

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Screenshots

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…ai/IsaacLab into feature/hyperparam_tune

Signed-off-by: garylvov <[email protected]>

…ai/IsaacLab into feature/hyperparam_tune

garylvov · 2024-11-04T14:15:23Z

I also made this walk-through video for this integration. I'd be happy to link it in the ray.rst lmk what you think

https://www.youtube.com/watch?v=z7MDgSga2Ho

jsmith-bdai

Did another pass through, still need to try it out

source/standalone/workflows/ray/isaac_ray_tune.py

source/standalone/workflows/ray/isaac_ray_util.py

docs/source/features/ray.rst

source/standalone/workflows/ray/isaac_ray_tune.py

docs/source/features/ray.rst

source/standalone/workflows/ray/hyperparameter_tuning/vision_cartpole_cfg.py

source/standalone/workflows/ray/isaac_ray_util.py

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Signed-off-by: glvov-bdai <[email protected]>

garylvov and others added 30 commits September 27, 2024 00:14

start

67122dc

add feature extraction

2d207b5

blank

1aa2832

further

e4c395f

add args

c53d987

Merge branch 'isaac-sim:main' into feature/hyperparam_tune

50862cc

formatting

905bed1

tweaks

6909501

fix

2577827

allow jobs to actually get scheduled

c21b2f5

add dockerfile

ceba315

formatting

7771439

tweaks

6563e1f

get gcp cluster working with ray, and isaac

b27092f

make bash command consistent

b94fe87

tweaks

9a525ec

formatting

e6e9f85

formatting

1187ca0

fix argparser

83cb89a

formatting

1885d0b

cleanup command

653b8ae

start argparser

dc9fb3f

sync

5f9f0dd

Merge branch 'isaac-sim:main' into feature/hyperparam_tune

3cde9e4

formatting

db975bc

cherrypick ResNet Cart from PR

c80d278

add extra point in readme

7fd0169

add note about saving

4dd48b1

fixes

873ea54

Merge branch 'isaac-sim:main' into feature/hyperparam_tune

93fbff3

glvov-bdai and others added 7 commits November 3, 2024 16:20

fix indent level on doc

40d7441

Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…

a754a23

…ai/IsaacLab into feature/hyperparam_tune

fix tensorboard cmd in docs

371940e

fix code block for extra deps

bb17a37

Fix dockerfile

5452e9e

Signed-off-by: garylvov <[email protected]>

fix error in documentation discovered during tutorial vid

f9b1f40

Merge branch 'feature/hyperparam_tune' of https://github.com/glvov-bd…

9800132

…ai/IsaacLab into feature/hyperparam_tune

jsmith-bdai reviewed Nov 4, 2024

View reviewed changes

garylvov and others added 15 commits November 4, 2024 20:22

Update docs/source/features/ray.rst

0f6c484

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update source/standalone/workflows/ray/isaac_ray_tune.py

2d5e14d

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update source/standalone/workflows/ray/isaac_ray_tune.py

7c56a6b

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update source/standalone/workflows/ray/isaac_ray_tune.py

acd03c1

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update docs/source/features/ray.rst

75c4f49

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update source/standalone/workflows/ray/isaac_ray_tune.py

8311bc6

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

Update source/standalone/workflows/ray/isaac_ray_tune.py

9302017

Co-authored-by: James Smith <[email protected]> Signed-off-by: garylvov <[email protected]>

address james' comments

34afd2a

delete old file and fix imports

70aa958

format

5ba620d

change top level to be caps'

80b5df5

fix docstrings and typos

5027656

Merge branch 'main' into feature/hyperparam_tune

b293b48

fix weird bolding thing

aa92a9d

fix emphasize lines and included files in rst

34f0908

glvov-bdai requested a review from jsmith-bdai November 5, 2024 04:11

glvov-bdai and others added 5 commits November 7, 2024 13:23

Merge branch 'main' into feature/hyperparam_tune

0856cd6

Merge branch 'main' into feature/hyperparam_tune

7cc587a

Merge branch 'main' into feature/hyperparam_tune

a38de8e

Merge branch 'main' into feature/hyperparam_tune

30a63ff

Merge branch 'main' into feature/hyperparam_tune

ad8161d

Signed-off-by: glvov-bdai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

glvov-bdai commented Oct 25, 2024 •

edited

Loading

garylvov commented Nov 4, 2024

jsmith-bdai left a comment

Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

Are you sure you want to change the base?

Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud #1301

Conversation

glvov-bdai commented Oct 25, 2024 • edited Loading

Description

Type of change

Screenshots

Checklist

garylvov commented Nov 4, 2024

jsmith-bdai left a comment

Choose a reason for hiding this comment

glvov-bdai commented Oct 25, 2024 •

edited

Loading