-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix
] Quantization of token embeddings
#2885
base: master
Are you sure you want to change the base?
[fix
] Quantization of token embeddings
#2885
Conversation
fix
] Quantization of token embeddings
if not isinstance(embeddings[0], list) and len(embeddings[0].shape) == 2: | ||
# It will happen when we request token_embeddings | ||
lengths = [embedding.shape[0] for embedding in embeddings] | ||
embeddings = np.concatenate(embeddings) | ||
if isinstance(embeddings[0], Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this if statement be above the previous if, as sending in a list of Tensors is also valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ir2718 You were absolutely right, thank you! Changed the order of the statements.
Hello! Apologies, I haven't yet had time to look into this a bit deeper, but I think an edge case that might be missed is
|
Not sure if I can modify the PR, but following Tom's dict edge case, I think adding this should suffice:
with a modification in SentenceTransformer.py, line 638, right before the return statement:
|
Thanks, @ir2718! I wonder whether we should return a dictionary. That breaks the interface of the encode method. @tomaarsen Would that be the expected behaviour? |
I decided to implement the quantization for this edge case differently than suggested. The @ir2718 I didn't use the attention mask on purpose. I thought it would be best to keep the shapes consistent, no matter if we use |
Agreed, I was thinking about that myself, but since transformers mostly handle things in dicts my first idea was to implement it that way. Not breaking the interface is probably a better solution, but requires adding some kind of note in the docs about the ordering of embeddings. |
Problem
The
encode
method raises aValueError
when we requestprecision
different thanfloat32
andoutput_value="token_embeddings"
, as reported in #2882.Solution
This PR provides a fix that combines all the token embeddings into a single array, runs the normalization, and eventually reconstructs the shape of the original array so we can distinguish token embeddings coming from each input example.