fp8 bwd #108

micmelesse · 2024-12-09T16:37:06Z

No description provided.

feat: added fp32 output to input_helper passing feat: fp8 tests. small amount of error added fp8e5m2 type note: RuntimeError: "abs_cuda" not implemented for 'Float8_e4m3fnuz' enabled fp8 GEMMs fix: error down to < 0.1 added another fp8 dtype best accuracy is with no scaling improved accuracy to within < 0.02. issue related to torch side casting fix: passes if we allow v to be fp16 instead of fp8. otherwise we have error < 0.1 all error is < 0.07 feat: added per head scaling tensors progress towards implementing scaling tensors in kernel save issue: error caused by acc += tl.dot(p.to(v.type.element_ty), v)

…scaling

alexkranias-amd and others added 10 commits December 9, 2024 10:09

fix mismatches

7434112

no navi for now

2ea54b1

fix: ref uses scaling + added ENV VAR to enable/disable quantization …

89d3d7d

…scaling

fix: fp8 ref matches kernel

c65af82

misc: added note about p_scale

9297d78

feat: added precision error test for various triton ops

f92ca5b

save

9ed1d00

feat: added benchmark for fp8 flash attention

1c3f756

fix: quantization scaling in fp8 benchmark

c4ca789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8 bwd #108

fp8 bwd #108

micmelesse commented Dec 9, 2024

fp8 bwd #108

Are you sure you want to change the base?

fp8 bwd #108

Conversation

micmelesse commented Dec 9, 2024