-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support transformers 4.43 #1971
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
OwlViT and OwlV2 need an upgrade to their onnx config (opset=11->12 for the einsum operator), I will push it once the onnxruntime tests finish (to see if there's anything left). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of this @IlyasMoutawwakil
should be all fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great thanks a lot @IlyasMoutawwakil
if use_torch is True: | ||
cache_position = cache_position.to(self.device) | ||
|
||
return use_cache_branch_tensor, past_key_values, cache_position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a breaking change so we should be careful, not sure this method is used by anyone though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method is only used by the forward pass, I don't think any sub packages use it
…gface/optimum into support-transformers-4.43
@IlyasMoutawwakil Is this PR ready to support transformers 4.43.3 version? |
@sreenivasulureddysura yep it's ready |
* fix bt bark test * setup * patch clip models for sd * infer ort model dtype property from inputs dtypes * patch all clip variants * device setter * bigger model for now * fix device attribution * onnx opset for owlvit and owlv2 * model dtype * revert * use model part dtype instead * no need for dtype with diffusion pipelines * revert * fix clip text model with projection not outputting hidden states * whisper generation * fix whisper, support cache_position, and using transformers whisper generation loop * style * create cache position for merged decoder and fix test for non whisper speech to text * typo * conditioned cache position argument * update whisper min transformers version * compare whisper ort generation with transformers * fix generation length for speech to text model type * cache position in whisper only with dynamic axis decoder_sequence_length * use minimal prepare_inputs_for_generation in ORTModelForSpeechSeq2Seq * remove version restrictions on whisper * comment * fix * simpler --------- Co-authored-by: Ella Charlaix <[email protected]>
Support the `cache_position` input that was added to Hugging Face Whisper models as part of a revision of how it handles KV-caching. This is like `position_ids`, but there is no batch dimension. See huggingface/optimum#1971 and huggingface/transformers#31166.
Support the `cache_position` input that was added to Hugging Face Whisper models as part of a revision of how it handles KV-caching. This is like `position_ids`, but there is no batch dimension. See huggingface/optimum#1971 and huggingface/transformers#31166.
What does this PR do?
This PR adds support for transformers 4.43 with which the following issues emerge:
clip
models usingsdpa
attention.pipeline
callingmodel.dtype
to convert inputs to the correct floating point precisionbark
models can't be saved due to shared tensors (track inBarkModel
can't be saved anymore transformers#32224)cache_position
.Before submitting
Who can review?