Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transformers 4.43 #1971

Merged
merged 32 commits into from
Aug 5, 2024
Merged

Support transformers 4.43 #1971

merged 32 commits into from
Aug 5, 2024

Conversation

IlyasMoutawwakil
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil commented Jul 25, 2024

What does this PR do?

This PR adds support for transformers 4.43 with which the following issues emerge:

  • clip models using sdpa attention.
  • pipeline calling model.dtype to convert inputs to the correct floating point precision
  • bark models can't be saved due to shared tensors (track in BarkModel can't be saved anymore transformers#32224)
  • whisper introducing a new forward input argument cache_position.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil mentioned this pull request Jul 25, 2024
3 tasks
@IlyasMoutawwakil
Copy link
Member Author

OwlViT and OwlV2 need an upgrade to their onnx config (opset=11->12 for the einsum operator), I will push it once the onnxruntime tests finish (to see if there's anything left).

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of this @IlyasMoutawwakil

optimum/exporters/utils.py Show resolved Hide resolved
optimum/onnxruntime/modeling_diffusion.py Outdated Show resolved Hide resolved
@IlyasMoutawwakil
Copy link
Member Author

should be all fixed now

@IlyasMoutawwakil IlyasMoutawwakil requested a review from dacorvo July 29, 2024 15:12
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks a lot @IlyasMoutawwakil

optimum/utils/input_generators.py Outdated Show resolved Hide resolved
optimum/exporters/onnx/config.py Outdated Show resolved Hide resolved
if use_torch is True:
cache_position = cache_position.to(self.device)

return use_cache_branch_tensor, past_key_values, cache_position
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a breaking change so we should be careful, not sure this method is used by anyone though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method is only used by the forward pass, I don't think any sub packages use it

optimum/onnxruntime/modeling_seq2seq.py Outdated Show resolved Hide resolved
@sreenivasulureddysura
Copy link

@IlyasMoutawwakil Is this PR ready to support transformers 4.43.3 version?

@IlyasMoutawwakil
Copy link
Member Author

@sreenivasulureddysura yep it's ready

@echarlaix echarlaix merged commit f8f9707 into main Aug 5, 2024
60 of 64 checks passed
@echarlaix echarlaix deleted the support-transformers-4.43 branch August 5, 2024 13:00
echarlaix added a commit that referenced this pull request Aug 5, 2024
* fix bt bark test

* setup

* patch clip models for sd

* infer ort model dtype property from inputs dtypes

* patch all clip variants

* device setter

* bigger model for now

* fix device attribution

* onnx opset for owlvit and owlv2

* model dtype

* revert

* use model part dtype instead

* no need for dtype with diffusion pipelines

* revert

* fix clip text model with projection not outputting hidden states

* whisper generation

* fix whisper, support cache_position, and using transformers whisper generation loop

* style

* create cache position for merged decoder and fix test for non whisper speech to text

* typo

* conditioned cache position argument

* update whisper min transformers version

* compare whisper ort generation with transformers

* fix generation length for speech to text model type

* cache position in whisper only with dynamic axis decoder_sequence_length

* use minimal prepare_inputs_for_generation in ORTModelForSpeechSeq2Seq

* remove version restrictions on whisper

* comment

* fix

* simpler

---------

Co-authored-by: Ella Charlaix <[email protected]>
robertknight added a commit to robertknight/rten that referenced this pull request Oct 27, 2024
Support the `cache_position` input that was added to Hugging Face Whisper models
as part of a revision of how it handles KV-caching.

This is like `position_ids`, but there is no batch dimension.

See huggingface/optimum#1971 and
huggingface/transformers#31166.
robertknight added a commit to robertknight/rten that referenced this pull request Oct 27, 2024
Support the `cache_position` input that was added to Hugging Face Whisper models
as part of a revision of how it handles KV-caching.

This is like `position_ids`, but there is no batch dimension.

See huggingface/optimum#1971 and
huggingface/transformers#31166.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants