Skip to content

Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference #3110

Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference

Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference #3110