Support for Sliding Window Attention #109

alexkranias-amd · 2024-12-10T14:55:27Z

TODO
This branch was built on top of an old rotary branch. Bad choice on my part (@alexkranias-amd) at the time. We have since made many changes to main-perf. I recommend rebasing the commits starting from feat + test: sliding window partially working and newer onto the most up to date main_perf.

currently is implemented in kvcache, but still need to add to prefill

* flash_attn_func works Compress This is a combination of 12 commits. add scripts save add our kernel import our kernel round trip use bshd layout figure out segfault fix show backward failure with prints save backward work run forward only test smallest config on everything add test fix remove pre commit install triton skip dropout pin d 32 factor d just run power of 2 remove timeout run serially clean up clean up 2 * Varlen works This is a combination of 6 commits. save some tests passing enable more enable everything move around alibi works * keep interface and kernel seperate * clean up

* Compress kvcache work This is a combination of 11 commits. kvcache work This is a combination of 4 commits. kvcache is not supported save save decode save clean up merge save cases save save save save key mask on triton side fix q size issue test combos save * fix causal. use cache_seqlens * clean and test what works * some configs work on new_kv but fails on 1,8 * cache overwrite correct * new_kv works more or less * test local * work on paged kv attention * prefill paged attention * fix has_batch_idx and skip local and rotatary emb * save * save * save * save * handle new_kv when paged kv cache * all except has_batch_idx works * major options are green * test all * add tests * save * clean up * minor clean up * simplest config * save debug true * save * refactor slightly * save work * need key masking * force hip * use is_hip * save * fix cache_seq_len issue * work on new_kv * pass new_kv data * save * benchmark fwd only * disable debug * pandas pdf * save * set methods * record number of heads * use configs * flexiable dim, n-heads, headofdim * better benchmarking * basic inplace update working * works upto 64 * new_kv supported! * test case for has_batch_idx * has_batch_idx works! * save * save * save * save ref * fix mqa and gqa by duplicating * GQA and MQA working by kernel modifications * fix new_kv with gqa * cache index * deal with nans on fwd_splitk * save * causal working on basic case * causal works! * alibi works! * clean up * clean prefill changes * remove bwd stuff * limit decode test to test_op_fwd * add ref * use bfloat

Fixes after rebase rebase fixes deal with kvcache failure new run for branch cancel-in-progress fix varlen_fwd bug

* Clean Clean This is a combination of 4 commits. clean 1 clean 2 clean more match main typo fix * use is_hip() * clean up more * skip odd d only * fix bug * skip randomly * use Flag * update readme * remove quantization * remove bwd * minor * print * remove verbose print * qunatize zero's out the d stride

…n(torch.autograd.Function)

- added a pyskip for an odd case of using mha_type:"gqa" - changed batch_size=1 and nheads=1#

… happens

…flash-attention into alexkranias/sliding_window

micmelesse and others added 30 commits August 29, 2024 09:18

Fixes after rebase

b2f6523

Fixes after rebase rebase fixes deal with kvcache failure new run for branch cancel-in-progress fix varlen_fwd bug

enable packed layouts and all configs (#72)

7b8a15c

feat: pytest case for pytorch implmentation of RoPE

8945558

feat: pytest progress

ffecd6d

feat: added MetaData and incorporated Tri Dao RoPE in class _attentio…

12c6a68

…n(torch.autograd.Function)

fix: rotate input_metadata.k_new instead of k (k_cache)

15d0683

feat: added files to gitignore

7b83552

fix: changed batch and head size back to main_perf sizes

97f751b

test: found a failing test in flash_attn_kvcache

2a137f8

- added a pyskip for an odd case of using mha_type:"gqa" - changed batch_size=1 and nheads=1#

tests: added debug prints to flash_attn_kvcache tests

9709453

tests: added debug prints to kvcache tests

75daa99

tests: found a failing test (no DEBUG)

eba41a0

test: isolated a failing case

95e4aa7

test: got isolated failing case to pass by reordering when scaling qk…

1c05ba7

… happens

test: found deviation in scores

5923025

test: added prints to see that sum is equivalent

aadf908

test: reduced to failing test

527ecb1

test: added tests for tl.dot & tl.exp2 with casting

1b0a841

test: added precision error test

a99aa64

chore: removed unnecessary prints and flags

1e6796a

fix: restored csrc dir

745c864

fix: removed csrc from gitignore and add back to gitmodules

704f976

feat: disable rotary kernel

99f2b07

refactor

0ae279c

tests: rotary working and passing test reference

05293ca

feat + test: sliding window partially working

52a4612

test: recreated successful test

174009f

alexkranias-amd added 11 commits October 2, 2024 16:52

test: more accurate sliding window test

01f346a

test: added mask for BLOCK_M

64cf1b8

update

130939b

test: added mask to BLOCK_N

67e1280

progress

3e14c0a

fix: WINDOW_SIZE_LEFT must be a positive number

ecd7980

remove comments and new test cases

9dfde04

update

f61507d

save

9fa14ff

Merge branch 'alexkranias/sliding_window' of https://github.com/ROCm/…

e567013

…flash-attention into alexkranias/sliding_window

test: passing

896670b

alexkranias-amd assigned alexkranias-amd and micmelesse Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Sliding Window Attention #109

Support for Sliding Window Attention #109

alexkranias-amd commented Dec 10, 2024

Support for Sliding Window Attention #109

Are you sure you want to change the base?

Support for Sliding Window Attention #109

Conversation

alexkranias-amd commented Dec 10, 2024