Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prepare + apply #7

Merged
merged 29 commits into from
Dec 17, 2024
Merged

Fix prepare + apply #7

merged 29 commits into from
Dec 17, 2024

Conversation

jmamou
Copy link
Collaborator

@jmamou jmamou commented Dec 8, 2024

No description provided.

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 9, 2024

Last 2 commits include

  • simplify suppress_tokens
  • refactor AssistantToTargetTranslator to avoid moving tensors to cpu
  • fix _prepare_assistant_input_ids of USD
  • fix logits_processors bug: logits_processors was called after sampling assistant token ids.

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 9, 2024

@gauravjain14 this PR addresses huggingface#35029 (comment)

@jmamou jmamou requested a review from keyboardAnt December 9, 2024 15:04
@gauravjain14
Copy link
Collaborator

@jmamou
To try this, do I need to apply the changes on top of #6?

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 9, 2024

@jmamou To try this, do I need to apply the changes on top of #6?

no, just checkout fix_prepare branch

Copy link
Collaborator

@gauravjain14 gauravjain14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes look good to me. I was able to run the failing test cases and they seem to have been resolved with this.

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jmamou! It's good news that @gauravjain14's tests pass for this PR. I added some questions and minor comments, mostly about simplifying the implementation.

src/transformers/generation/utils.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
if i > 0:
self._prev_assistant_ids = self._prev_assistant_ids[:,:-i]
assistant_input_ids = torch.cat([self._prev_assistant_ids, assistant_new_ids], dim=-1)
assistant_input_ids = assistant_input_ids.to(torch.int)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, cat operates on arrays of the same type. Wdyt about ensuring that self._prev_assistant_ids and assistant_new_ids are already of torch.int type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean adding before cat

self._prev_assistant_ids = self._prev_assistant_ids.to(torch.int)
assistant_new_ids = assistant_new_ids.to(torch.int)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about ensuring we only assign torch.int to self._prev_assistant_ids and assistant_new_ids in the first place—so that we never need to cast them into torch.int?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get all the IDs from the tokenizer and their type is int. Do you think that it is necessary to ensure they are of int type?

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat puzzled by the target_vocab_size argument. 👀

src/transformers/generation/logits_process.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with model microsoft/Phi-3-medium-128k-instruct
len(target_tokenizer.get_vocab()) = 32011
while config.vocab_size= 32064

Where/why do we set config.vocab_size = 32064 if we know that len(target_tokenizer.get_vocab()) = 32011?

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 12, 2024

with model microsoft/Phi-3-medium-128k-instruct
len(target_tokenizer.get_vocab()) = 32011
while config.vocab_size= 32064

Where/why do we set config.vocab_size = 32064 if we know that len(target_tokenizer.get_vocab()) = 32011?

we don't set it.
It is part of model config
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/blob/main/config.json#L169

I suppose that some models pad their vocabulary size for efficiency, 64 is a power of 2....

Another example Qwen/Qwen2-0.5B-Instruct

Relevant discussion https://huggingface.co/microsoft/phi-1_5/discussions/29

Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

src/transformers/generation/logits_process.py Outdated Show resolved Hide resolved
tests/generation/test_configuration_utils.py Outdated Show resolved Hide resolved
src/transformers/generation/utils.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved
@jmamou
Copy link
Collaborator Author

jmamou commented Dec 15, 2024

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

original implementation of SuppressTokensLogitsProcessor was buggy and not optimal. please explain your concern ...

@keyboardAnt
Copy link
Owner

keyboardAnt commented Dec 15, 2024

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

original implementation of SuppressTokensLogitsProcessor was buggy and not optimal. please explain your concern ...

My concern is that such a change might cause failures for users of Hugging Face Transformers who call SuppressTokensLogitsProcessor while expecting the existing API. Changing the API would require these users to adjust their current implementations.

Another option is to extend the API of the existing class without breaking it or to create an entirely new class.

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 15, 2024

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 15, 2024

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

I opt for the second option of creating a new class.

@keyboardAnt
Copy link
Owner

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

I opt for the second option of creating a new class.

Sounds good. Bugs in the existing SuppressTokensLogitsProcessor will then no longer be relevant for USD and can be reported to Hugging Face or fixed in separate PRs (not urgent).

@gauravjain14
Copy link
Collaborator

What is the expectation on the generation_mode being ASSISTED_GENERATION when speculative decoding with different tokenizers is enabled?

if generation_mode == GenerationMode.ASSISTED_GENERATION:

When I run this script - https://gist.github.com/gauravjain14/19edce088b1f1e7b5dc9ace684e53f8d - with do_sample=True
the first call into the function generate has generation_mode=GenerationMode.ASSISTED_GENERATION but subsequent calls into the function have generation_mode=GenerationMode.GREEDY_SEARCH.

Is this expected? @jmamou, @keyboardAnt?

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 16, 2024

What is the expectation on the generation_mode being ASSISTED_GENERATION when speculative decoding with different tokenizers is enabled?

if generation_mode == GenerationMode.ASSISTED_GENERATION:

When I run this script - https://gist.github.com/gauravjain14/19edce088b1f1e7b5dc9ace684e53f8d - with do_sample=True the first call into the function generate has generation_mode=GenerationMode.ASSISTED_GENERATION but subsequent calls into the function have generation_mode=GenerationMode.GREEDY_SEARCH.

Is this expected? @jmamou, @keyboardAnt?

generation_mode of the target is GenerationMode.ASSISTED_GENERATION while generation_mode of the assistant model should be GenerationMode.SAMPLE (do_sample=True) or GenerationMode.GREEDY_SEARCH (do_sample=False).
That's the reason why you can get GenerationMode.ASSISTED_GENERATION for the first call to generate (self is target). But you should get GenerationMode.SAMPLE for subsequent generate calls (self is assistant) until the target generate call for validation.

@keyboardAnt
Copy link
Owner

@jmamou, please hit the 'Re-request Review' button when you're ready.

@jmamou jmamou requested a review from keyboardAnt December 17, 2024 10:33
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@keyboardAnt keyboardAnt merged commit 9d4d9f9 into usd Dec 17, 2024
@keyboardAnt keyboardAnt deleted the fix_prepare branch December 17, 2024 22:52
@keyboardAnt
Copy link
Owner

@jmamou, it seems like the changes fail the CI tests. Do they pass for you locally?

@jmamou
Copy link
Collaborator Author

jmamou commented Dec 18, 2024

@jmamou, it seems like the changes fail the CI tests. Do they pass for you locally?

After solving conflicts and tests, remaining failing test does not seem to be related to USD https://app.circleci.com/pipelines/github/huggingface/transformers/113897/workflows/98283892-64b7-4e14-b8a3-8f7da8f9aa61/jobs/1523429?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants