Use pytorch2 optimized native attention #39

attesaarela · 2023-07-20T19:12:54Z

Hi, here is a pull request for a small speedup where attention is computed using pytorch 2 function "torch.nn.functional.scaled_dot_product_attention" if available.

Makes the optimizer run about 10% faster according to a bit of testing I did

This optimization was essentially copied from a recent version of nanoGPT

…n essentially copied from recent nanoGPT version

attesaarela added 3 commits July 20, 2023 22:05

Use pytorch 2 native attention if available. Usage of native attentio…

5e2390c

…n essentially copied from recent nanoGPT version

Use pytorch 2 native attention if available. Usage of native attentio…

135a505

…n essentially copied from recent nanoGPT version

cleanup

2e90003

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pytorch2 optimized native attention #39

Use pytorch2 optimized native attention #39

attesaarela commented Jul 20, 2023

Use pytorch2 optimized native attention #39

Are you sure you want to change the base?

Use pytorch2 optimized native attention #39

Conversation

attesaarela commented Jul 20, 2023