add Optimi(more fused_background_optimizer and new function) #1381

sdbds · 2024-06-24T01:57:20Z

https://optimi.benjaminwarner.dev/

New optimizer:
adamw、lion、ranger and stableadamw from Optimi

New function:
Low Precision Training with Kahan Summation(auto use when use above optimizers)

Gradient Release(same as fused background pass)
Auto use when use above optimizers.

Fully Decoupled Weight Decay(looks like decoupled lr)

Optimizer Accumulation
Gradient accumulation reduces training memory by splitting a batch into micro-batches and accumulating micro-batch gradients into the larger batch. Gradient release reduces training memory by limiting gradients to one layer at any given time. Optimizer accumulation unifies these two disparate approaches by accumulating gradients directly into optimizer states while performing gradient release.

sdbds · 2024-06-24T01:59:01Z

The problem now is that the VRAM usage keeps going up after using prepare_for_gradient_release() for hook, not sure about the problem at the moment...

feffy380 · 2024-06-30T09:11:11Z

Some of optimi's features do not support fp16. They should not replace the original optimizers.

sdbds · 2024-07-01T12:59:45Z

Some of optimi's features do not support fp16. They should not replace the original optimizers.

Sure, i will add fp16 checker, thanks

feffy380 · 2024-07-01T13:06:33Z

I'm saying it should not be considered a substitute for the original optimizers. Some features do work with fp16 (like low precision training), but others don't.

You don't need an fp16 check, you need to treat the optimi versions as completely separate optimizers, the same way Adamw8bit is considered separate from Adamw.

People can also just load them with --optimizer_type optimi.Adamw. Only gradient release and optimizer accumulation need special handling.

sdbds added 4 commits June 15, 2024 22:56

init

a959567

Update requirements.txt

4a27ba0

Update requirements.txt

f75452b

add hook

ed99b21

sdbds mentioned this pull request Jun 26, 2024

When using accelerator with Gradient Release hook, Increasing VRAM consumption warner-benjamin/optimi#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Optimi(more fused_background_optimizer and new function) #1381

add Optimi(more fused_background_optimizer and new function) #1381

sdbds commented Jun 24, 2024

sdbds commented Jun 24, 2024

feffy380 commented Jun 30, 2024

sdbds commented Jul 1, 2024

feffy380 commented Jul 1, 2024 •

edited

Loading

add Optimi(more fused_background_optimizer and new function) #1381

Are you sure you want to change the base?

add Optimi(more fused_background_optimizer and new function) #1381

Conversation

sdbds commented Jun 24, 2024

sdbds commented Jun 24, 2024

feffy380 commented Jun 30, 2024

sdbds commented Jul 1, 2024

feffy380 commented Jul 1, 2024 • edited Loading

feffy380 commented Jul 1, 2024 •

edited

Loading