-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix prepare + apply #7
Conversation
Last 2 commits include
|
@gauravjain14 this PR addresses huggingface#35029 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the changes look good to me. I was able to run the failing test cases and they seem to have been resolved with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jmamou! It's good news that @gauravjain14's tests pass for this PR. I added some questions and minor comments, mostly about simplifying the implementation.
if i > 0: | ||
self._prev_assistant_ids = self._prev_assistant_ids[:,:-i] | ||
assistant_input_ids = torch.cat([self._prev_assistant_ids, assistant_new_ids], dim=-1) | ||
assistant_input_ids = assistant_input_ids.to(torch.int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, cat
operates on arrays of the same type. Wdyt about ensuring that self._prev_assistant_ids
and assistant_new_ids
are already of torch.int
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean adding before cat
self._prev_assistant_ids = self._prev_assistant_ids.to(torch.int)
assistant_new_ids = assistant_new_ids.to(torch.int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wdyt about ensuring we only assign torch.int
to self._prev_assistant_ids
and assistant_new_ids
in the first place—so that we never need to cast them into torch.int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get all the IDs from the tokenizer and their type is int
. Do you think that it is necessary to ensure they are of int
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat puzzled by the target_vocab_size
argument. 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with model
microsoft/Phi-3-medium-128k-instruct
len(target_tokenizer.get_vocab())
= 32011
whileconfig.vocab_size
= 32064
Where/why do we set config.vocab_size = 32064
if we know that len(target_tokenizer.get_vocab()) = 32011
?
we don't set it. I suppose that some models pad their vocabulary size for efficiency, 64 is a power of 2.... Another example Relevant discussion https://huggingface.co/microsoft/phi-1_5/discussions/29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor
.
original implementation of |
My concern is that such a change might cause failures for users of Hugging Face Transformers who call Another option is to extend the API of the existing class without breaking it or to create an entirely new class. |
Sorry for the misunderstanding ... I was not aware that |
I opt for the second option of creating a new class. |
Sounds good. Bugs in the existing |
What is the expectation on the transformers/src/transformers/generation/utils.py Line 2165 in 1ed1de2
When I run this script - https://gist.github.com/gauravjain14/19edce088b1f1e7b5dc9ace684e53f8d - with Is this expected? @jmamou, @keyboardAnt? |
|
@jmamou, please hit the 'Re-request Review' button when you're ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
After solving conflicts and tests, remaining failing test does not seem to be related to USD https://app.circleci.com/pipelines/github/huggingface/transformers/113897/workflows/98283892-64b7-4e14-b8a3-8f7da8f9aa61/jobs/1523429?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary |
No description provided.