add rlhf model link #63

HeyyyyyyG · 2023-12-18T23:36:21Z

What does this PR do ?

Add NV-Llama2-70B-RLHF model link

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

Signed-off-by: jiaqiz <[email protected]>

Signed-off-by: jiaqiz <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]>

…ligner into jiaqiz/add_rlhf_model_link Signed-off-by: Jiaqi Zeng <[email protected]>

* add timer class Signed-off-by: Gerald Shen <[email protected]> * fixup! add timer class Signed-off-by: Gerald Shen <[email protected]> * change check progress function Signed-off-by: Gerald Shen <[email protected]> * add timer to rm Signed-off-by: Gerald Shen <[email protected]> * add timer to supervised trainer Signed-off-by: Gerald Shen <[email protected]> * Update examples/nlp/gpt/train_reward_model.py Co-authored-by: Olivier Delalleau <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Update nemo_aligner/utils/distributed.py Co-authored-by: Olivier Delalleau <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * add logging when finished Signed-off-by: Gerald Shen <[email protected]> * Update nemo_aligner/utils/distributed.py Co-authored-by: Olivier Delalleau <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * add timer into sft examples Signed-off-by: Gerald Shen <[email protected]> * add timer onto dpo Signed-off-by: Gerald Shen <[email protected]> * add check progress onto PPO Signed-off-by: Gerald Shen <[email protected]> * add timer onto ppo Signed-off-by: Gerald Shen <[email protected]> * update changelog Signed-off-by: Gerald Shen <[email protected]> * update changelog Signed-off-by: Gerald Shen <[email protected]> * Update CHANGELOG.md Co-authored-by: Olivier Delalleau <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Olivier Delalleau <[email protected]> Co-authored-by: trias702 <[email protected]>

Signed-off-by: jiaqiz <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]>

…ligner into jiaqiz/add_rlhf_model_link Signed-off-by: Jiaqi Zeng <[email protected]>

add rlhf model link

a0322be

Signed-off-by: jiaqiz <[email protected]>

HeyyyyyyG requested a review from gshennvm December 18, 2023 23:36

gshennvm approved these changes Dec 19, 2023

View reviewed changes

HeyyyyyyG and others added 5 commits December 18, 2023 19:46

add rlhf model link

3c98b90

Signed-off-by: jiaqiz <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]>

Merge branch 'jiaqiz/add_rlhf_model_link' of github.com:NVIDIA/NeMo-A…

220b945

…ligner into jiaqiz/add_rlhf_model_link Signed-off-by: Jiaqi Zeng <[email protected]>

add rlhf model link

77367a5

Signed-off-by: jiaqiz <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]>

Merge branch 'jiaqiz/add_rlhf_model_link' of github.com:NVIDIA/NeMo-A…

fe2d308

…ligner into jiaqiz/add_rlhf_model_link Signed-off-by: Jiaqi Zeng <[email protected]>

HeyyyyyyG closed this Dec 19, 2023

HeyyyyyyG deleted the jiaqiz/add_rlhf_model_link branch December 19, 2023 04:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rlhf model link #63

add rlhf model link #63

HeyyyyyyG commented Dec 18, 2023

add rlhf model link #63

add rlhf model link #63

Conversation

HeyyyyyyG commented Dec 18, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information