Confused about number of steps #76

cinjon · 2024-11-25T00:54:41Z

Hi, I saw your training curve for Gemma 9b SimPO here: https://wandb.ai/yumeng0818/simpo/runs/4w25j650?nw=nwuseryumeng0818.
How is it that there's only 92 steps? At 128 batch size, that would only be 11k total examples seen, but there's ~60k in the dataset.
Thanks.

cchenv · 2024-12-09T23:14:32Z

Hi @cinjon did you figure it out? It's confusing. Also the actual batch size seems to be 256 (2 * 8 * 16), so there should be about 232 steps for 1 epoch.

cinjon · 2024-12-10T06:01:16Z

Still confused but our training runs are reasonable, so I gave up trying to guess theirs.
Yeah I was confused if it was 128 or 256.
I'm also confused about their eval templates and scores on Gemma.

cchenv · 2024-12-10T06:54:12Z

@cinjon I tried to use TRL's implementation (https://huggingface.co/docs/trl/cpo_trainer#simple-preference-optimization-simpo) for training runs, but I cannot reproduce their Gemma2-9B-it-SimPO model. The resulting model after 1 epoch on the dataset is so much worse. I noticed there is another PR by the authors to create a separate SimPOTrainer with TRL: huggingface/trl#1725 I hope that can fix the issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about number of steps #76

Confused about number of steps #76

cinjon commented Nov 25, 2024

cchenv commented Dec 9, 2024

cinjon commented Dec 10, 2024

cchenv commented Dec 10, 2024 •

edited

Loading

Confused about number of steps #76

Confused about number of steps #76

Comments

cinjon commented Nov 25, 2024

cchenv commented Dec 9, 2024

cinjon commented Dec 10, 2024

cchenv commented Dec 10, 2024 • edited Loading

cchenv commented Dec 10, 2024 •

edited

Loading