Enable fp8_weight recomputation during backwards pass #185

drisspg · 2024-01-13T02:16:01Z

Summary

Currently for both delayed and dynamic linear we have made a memory versus computation trade off. We always save the fp8_weight casted version of the weight for backwards.

On single node this will cause the both the model.weight tensors in high precision and the casted Float8Tensor version of the weight to exist on gpu memory until the casted weight is freed during backward pass.

We would like to provide an option to disable this.

This becomes more of an issue for existing FSDP implementation. The high precision wight will be sharded among the Gpus. When running forward, we will all-gather and unshard the weights to each gpu. But since since the nn.parameter is not used for mamtul( rather the casted float8 tensor is) the existing fsdp mechanism for removing the un-sharded weight will not work and we will end up saving the un-sharded tensor for backwards on each device. This extra memory usage scales w/ num gpus.

drisspg · 2024-01-31T22:58:02Z

Update here with two attempts to solve this:

vkuzo · 2024-07-30T15:37:30Z

pytorch/ao#562

drisspg mentioned this issue Jan 13, 2024

Add option for recomputing the casted weight during backwards #186

Open

vkuzo mentioned this issue Jan 16, 2024

upcoming feature tracker #187

Closed

This was referenced Jan 23, 2024

Checkpoint to reduce fp8_weight tensor saved for backwards #193

Open

Allow "must recompute" in torch.compile + selective checkpointing (SAC) pytorch/pytorch#114036

Closed

drisspg mentioned this issue Jan 31, 2024

Top Level Torch Compile Issue Tracker #195

Closed

12 tasks

vkuzo mentioned this issue Jul 30, 2024

Enable fp8_weight recomputation during backwards pass pytorch/ao#562

Open

vkuzo closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fp8_weight recomputation during backwards pass #185

Enable fp8_weight recomputation during backwards pass #185

drisspg commented Jan 13, 2024

drisspg commented Jan 31, 2024

vkuzo commented Jul 30, 2024

Enable fp8_weight recomputation during backwards pass #185

Enable fp8_weight recomputation during backwards pass #185

Comments

drisspg commented Jan 13, 2024

Summary

drisspg commented Jan 31, 2024

vkuzo commented Jul 30, 2024