Loss spike when training mosaic-bert (fp32) #236

sameerreddy13 · 2023-03-14T13:15:32Z

I am training with the mosaic-bert-base-uncased.yaml recipe on 8xA40s, with data created with the mosaic provided script for c4. I consistently get a loss spike and loss being stuck around 10k -15k steps into training. The only change is using fp32 instead of bfloat16.

Below spikes at ~10k

The text was updated successfully, but these errors were encountered:

sameerreddy13 · 2023-03-14T17:16:48Z

For my environment you can refer to my other issue (#237 (comment))

jacobfulano · 2023-03-15T15:18:56Z

Hey @sameerreddy13,

We did most of our benchmarking and testing in bf16, and chose hyperparameters that worked well with this setting.

With regards to fp32, we would recommend lowering the learning rate slightly as a first line of defense (barring potential issues relating to environment). If you are also playing around with a larger architecture (e.g. BERT-Large), we would also recommend lowering the learning rate!

sameerreddy13 · 2023-03-15T19:34:45Z

Hi @jacobfulano. Thank you for the advice! I actually realized the issue was with my flash attention being broken and am resolving that in another issue.

sameerreddy13 closed this as completed Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss spike when training mosaic-bert (fp32) #236

Loss spike when training mosaic-bert (fp32) #236

sameerreddy13 commented Mar 14, 2023 •

edited

Loading

sameerreddy13 commented Mar 14, 2023

jacobfulano commented Mar 15, 2023

sameerreddy13 commented Mar 15, 2023

Loss spike when training mosaic-bert (fp32) #236

Loss spike when training mosaic-bert (fp32) #236

Comments

sameerreddy13 commented Mar 14, 2023 • edited Loading

sameerreddy13 commented Mar 14, 2023

jacobfulano commented Mar 15, 2023

sameerreddy13 commented Mar 15, 2023

sameerreddy13 commented Mar 14, 2023 •

edited

Loading