add sdpa and flash_attention2 support to speech2text #33716
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I added support for sdpa and flash_attention2 to speech2text model (I started with sdpa but beacuse of "#copied" thought it would just be easier to add flash_attentnion2 as well). copy from bart.
addresses #26350 and #28005
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts @fxmarty
Notes:
Speech2TextSinusoidalPositionalEmbedding
which causedtest_eager_matches_sdpa_generate
to fail. I fixed it. The class was copied inSpeechT5
. so I fixed the bug there as welltest_flash_attn_2_generate_reuse_cache
was failing because he wasn't meant for speech2text model. I copied and tweaked the test fromwhisper
.test_flash_attn_2_from_config
with the errorValueError: Unrecognized configuration class <class 'transformers.models.speech_to_text.configuration_speech_to_text.Speech2TextConfig'> for this kind of AutoModel: AutoModelForCausalLM.
. speech2text doesn't have a model for causalLM but i couldn't find what other models that have the same situation and what they do with the test. bart hasBartForCausalLM
. so would love some guidance here