Fixes flash_attn + cascade attention_code to decoder Transformer bloc… #71

colon3ltocard · 2024-10-10T09:19:26Z

Cascade attention_code to decoder Transformer block (wasn't the case !)
fixes flash_attn dim ordering (It still won't work on grib > 64x64 due to the way the unetrpp splits attention heads accross channels and not flattened physical (pixels, voxels) dims
adds num_heads_decoder to allow changing the number of attention heads in the decoder
adds a Dockerfile to work on EWC with A100 and flash_attn, we need an older cuda version, added doc for that

…ks + adds settings for decoder number of heads

…unetrpp_flash_attn_decoder_heads

colon3ltocard added 2 commits October 10, 2024 09:14

Fixes flash_attn + cascade attention_code to decoder Transformer bloc…

1c67599

…ks + adds settings for decoder number of heads

typo

5c24d48

colon3ltocard requested review from flyIchtus, cbovalo and A669015 October 10, 2024 09:22

Merge branch 'main' of https://github.com/colon3ltocard/py4cast into …

0c7a030

…unetrpp_flash_attn_decoder_heads

cbovalo approved these changes Oct 10, 2024

View reviewed changes

colon3ltocard added 2 commits October 10, 2024 10:05

lint

920f38f

lint

5a360da

colon3ltocard requested a review from LBerth October 10, 2024 12:24

LBerth approved these changes Oct 10, 2024

View reviewed changes

LBerth merged commit 456461b into meteofrance:main Oct 10, 2024
1 check passed

Provide feedback