Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eos_token config in PPOTrainer #2387

Open
kechunFIVE opened this issue Nov 23, 2024 · 3 comments · May be fixed by #2420 or #2528
Open

eos_token config in PPOTrainer #2387

kechunFIVE opened this issue Nov 23, 2024 · 3 comments · May be fixed by #2420 or #2528
Labels
✨ enhancement New feature or request 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO

Comments

@kechunFIVE
Copy link

Feature request

It appears that the generation config in PPOTrainer does not set an eos_token, resulting in each generation process continuing until it reaches the maximum length before stopping, which is quite time-consuming.

Motivation

If the eos_token is set, it will significantly reduce the time spent during the generation phase.

Your contribution

i'm sorry

@qgallouedec
Copy link
Member

Thanks for reporting it. Would you like to open a PR to fix it?

@qgallouedec qgallouedec added ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🏋 PPO Related to PPO labels Nov 26, 2024
@dame-cell dame-cell linked a pull request Nov 30, 2024 that will close this issue
5 tasks
@qgallouedec qgallouedec added the 🙋 help from community wanted Open invitation for community members to contribute label Dec 13, 2024
@dawidm
Copy link

dawidm commented Dec 28, 2024

@kechunFIVE, @qgallouedec, @dame-cell Seems that current code is inspired by https://iclr-blogposts.github.io/2024/blog/the-n-implementation-details-of-rlhf-with-ppo/, section General implementation details, 4.2. Authors tried to recreate results from early OpenAI work, but they say:

Note that in a more recent codebase https://github.com/openai/summarize-from-feedback, OpenAI does stop sampling when encountering EOS token (summarize_from_feedback/utils/experiment_helpers.py#L19). However in this work we aim to do a 1:1 replication, so we align the setting that could keep sampling even eos_token is encountered

Reading PPOTrainer implementation, it seems that stop_token/stop_token_id arguments should control when to stop generation:

  • When stop_token is set to eos value, everything that is after EOS is padded and therefore ignored in calculations (using masks). In that case, keeping generating after all sequences have EOS seems just wasting of time and resources. I've checked that between stopping on EOS and generating until max length, loss values are almost exact and minimal differences of the order of 1e-8 are caused by masked calculations.
  • When stop_token_id is set, sequences get padded after this token, so generation should stop there, not model's EOS.
  • The case when stop_token != None and stop_token != 'eos' are not supported, so they should probably raise an exception.

@dawidm dawidm linked a pull request Dec 28, 2024 that will close this issue
5 tasks
@Benjoyo
Copy link

Benjoyo commented Jan 15, 2025

The option to control the stop token should be added to all online trainer configs imho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 👶 good first issue Good for newcomers 🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO
Projects
None yet
4 participants