Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference #3109
pr-cpu.yaml
on: pull_request
Matrix: pytest-cpu
Coverage Results
/
coverage
9s
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
coverage-cd54471d459d7f93f632e0693f391463850f0822-cpu-1.13.1
Expired
|
268 KB |
|
coverage-cd54471d459d7f93f632e0693f391463850f0822-cpu-2.1.0
Expired
|
268 KB |
|