[WIP] RL/DPO #935

winglian · 2023-12-10T21:35:23Z

No description provided.

flexchar · 2023-12-20T11:24:10Z

🔥

…or dpo

* ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs

winglian force-pushed the rl-trainer branch from 952ec14 to 3e7b5ba Compare January 1, 2024 17:29

winglian added 16 commits January 4, 2024 16:21

ipo-dpo trainer

5950831

fix missing abstract method

27d655f

chatml template, grad checkpointing kwargs support

8c9d925

fix steps calc for RL and add dataloader kwargs

1b4fb53

wip to fix dpo and start ppo

44494ce

more fixes

073c2a1

refactor to generalize map fn

6333719

fix dataset loop and handle argilla pref dataset

340bd88

set training args

4236db8

load reference model on seperate gpu if more than one device

c6ba434

no auto upload to hub for dpo, don't add lora adapters to ref model f…

9ccf60c

…or dpo

fixes for rl training

965262d

support for ipo from yaml

8429890

set dpo training args from the config, add tests

5917961

chore: lint

0e7f940

set sequence_len for model in test

17b986b

winglian force-pushed the rl-trainer branch from 3e7b5ba to 17b986b Compare January 4, 2024 21:21

add RLHF docs [skip ci]

45da2a8

winglian merged commit 1c37e2a into main Jan 4, 2024

winglian deleted the rl-trainer branch January 4, 2024 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] RL/DPO #935

[WIP] RL/DPO #935

winglian commented Dec 10, 2023

flexchar commented Dec 20, 2023

[WIP] RL/DPO #935

[WIP] RL/DPO #935

Conversation

winglian commented Dec 10, 2023

flexchar commented Dec 20, 2023