Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug-fix: account for gradient accumulation steps in !sample_packing_eff_est sample_packing case #978

Closed
wants to merge 1 commit into from

Conversation

kallewoof
Copy link
Contributor

This addresses issues e.g. with warmup_ratio becoming a multiple of the gradient accumulation steps.

See also #977.

@kallewoof
Copy link
Contributor Author

Actually I'm not comfortable enough to say this is ready for merging, but I'm using this patch locally to address warmup ratio being off. Perhaps it should be specifically applied there for now until this is sorted out.

@winglian
Copy link
Collaborator

@kallewoof warmup_ratio is the total number of steps over all epochs/the entire training run. how many gpus are you using, what hyperparams are you using and what step are you seeing warmup peak at?

@kallewoof
Copy link
Contributor Author

kallewoof commented Dec 21, 2023

@winglian I can post more detailed examples in some days, but this is what I get, with the above patch applied:

  • Peak learning rate (warmup endpoint, where learning rate = the specified value in axolotl.yml): epoch 1.0
  • Gradient accumulation steps: 2
  • Micro batch size: 2
  • Epochs: 4
  • Warmup ratio: 0.25

On main I hit peak learning rate at epoch 2.0.

I will try other gas values with and without the patch and post more details if needed (but as noted, it will be some days before I get to it, I'm afraid).

Edit: I should also note that the actual steps to train remains the same with and without the patch.

@kallewoof
Copy link
Contributor Author

Will revisit this later unless someone fixes in the meantime.

@kallewoof kallewoof closed this Jan 24, 2024
@kallewoof kallewoof deleted the 202312-steps-gas branch January 24, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants