Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timer impl #44

Merged
merged 16 commits into from
Dec 14, 2023
Merged

add timer impl #44

merged 16 commits into from
Dec 14, 2023

Conversation

gshennvm
Copy link
Collaborator

@gshennvm gshennvm commented Dec 5, 2023

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@gshennvm gshennvm marked this pull request as draft December 5, 2023 15:27
@github-actions github-actions bot added the Utils label Dec 5, 2023
@gshennvm gshennvm force-pushed the geshen/timer branch 2 times, most recently from 17f1bca to 124d082 Compare December 5, 2023 23:35
@odelalleau odelalleau self-requested a review December 6, 2023 04:14
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few small questions / comments

examples/nlp/gpt/train_reward_model.py Outdated Show resolved Hide resolved
nemo_aligner/utils/distributed.py Outdated Show resolved Hide resolved
nemo_aligner/utils/distributed.py Show resolved Hide resolved
nemo_aligner/utils/distributed.py Outdated Show resolved Hide resolved
nemo_aligner/utils/distributed.py Show resolved Hide resolved
nemo_aligner/algorithms/supervised.py Show resolved Hide resolved
@gshennvm
Copy link
Collaborator Author

gshennvm commented Dec 8, 2023

will carry this change over to SFT, DPO, PPO and then undraft. thanks for taking an initial look!

@gshennvm gshennvm marked this pull request as ready for review December 8, 2023 22:59
@gshennvm
Copy link
Collaborator Author

gshennvm commented Dec 8, 2023

undrafted! i'm testing it as we speak but let me know what you think

gshennvm and others added 13 commits December 8, 2023 15:13
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>
odelalleau
odelalleau previously approved these changes Dec 12, 2023
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a minor suggestion

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Olivier Delalleau <[email protected]>
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge!

@trias702 trias702 merged commit 7f40425 into main Dec 14, 2023
5 checks passed
@gshennvm gshennvm deleted the geshen/timer branch December 15, 2023 19:21
HeyyyyyyG pushed a commit that referenced this pull request Dec 19, 2023
* add timer class

Signed-off-by: Gerald Shen <[email protected]>

* fixup! add timer class

Signed-off-by: Gerald Shen <[email protected]>

* change check progress function

Signed-off-by: Gerald Shen <[email protected]>

* add timer to rm

Signed-off-by: Gerald Shen <[email protected]>

* add timer to supervised trainer

Signed-off-by: Gerald Shen <[email protected]>

* Update examples/nlp/gpt/train_reward_model.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add logging when finished

Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add timer into sft examples

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto dpo

Signed-off-by: Gerald Shen <[email protected]>

* add check progress onto PPO

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto ppo

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Olivier Delalleau <[email protected]>

---------

Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Co-authored-by: trias702 <[email protected]>
gshennvm added a commit that referenced this pull request Jan 26, 2024
* add timer class

Signed-off-by: Gerald Shen <[email protected]>

* fixup! add timer class

Signed-off-by: Gerald Shen <[email protected]>

* change check progress function

Signed-off-by: Gerald Shen <[email protected]>

* add timer to rm

Signed-off-by: Gerald Shen <[email protected]>

* add timer to supervised trainer

Signed-off-by: Gerald Shen <[email protected]>

* Update examples/nlp/gpt/train_reward_model.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add logging when finished

Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add timer into sft examples

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto dpo

Signed-off-by: Gerald Shen <[email protected]>

* add check progress onto PPO

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto ppo

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Olivier Delalleau <[email protected]>

---------

Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Co-authored-by: trias702 <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo-Aligner that referenced this pull request Jun 25, 2024
* add timer class

Signed-off-by: Gerald Shen <[email protected]>

* fixup! add timer class

Signed-off-by: Gerald Shen <[email protected]>

* change check progress function

Signed-off-by: Gerald Shen <[email protected]>

* add timer to rm

Signed-off-by: Gerald Shen <[email protected]>

* add timer to supervised trainer

Signed-off-by: Gerald Shen <[email protected]>

* Update examples/nlp/gpt/train_reward_model.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add logging when finished

Signed-off-by: Gerald Shen <[email protected]>

* Update nemo_aligner/utils/distributed.py

Co-authored-by: Olivier Delalleau <[email protected]>
Signed-off-by: Gerald Shen <[email protected]>

* add timer into sft examples

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto dpo

Signed-off-by: Gerald Shen <[email protected]>

* add check progress onto PPO

Signed-off-by: Gerald Shen <[email protected]>

* add timer onto ppo

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* update changelog

Signed-off-by: Gerald Shen <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Olivier Delalleau <[email protected]>

---------

Signed-off-by: Gerald Shen <[email protected]>
Co-authored-by: Olivier Delalleau <[email protected]>
Co-authored-by: trias702 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants