Fix prepare + apply #7

jmamou · 2024-12-08T15:41:33Z

No description provided.

jmamou · 2024-12-09T14:37:57Z

Last 2 commits include

simplify suppress_tokens
refactor AssistantToTargetTranslator to avoid moving tensors to cpu
fix _prepare_assistant_input_ids of USD
fix logits_processors bug: logits_processors was called after sampling assistant token ids.

jmamou · 2024-12-09T14:46:03Z

@gauravjain14 this PR addresses huggingface#35029 (comment)

gauravjain14 · 2024-12-09T17:09:20Z

@jmamou
To try this, do I need to apply the changes on top of #6?

jmamou · 2024-12-09T18:41:32Z

@jmamou To try this, do I need to apply the changes on top of #6?

no, just checkout fix_prepare branch

gauravjain14

Overall, the changes look good to me. I was able to run the failing test cases and they seem to have been resolved with this.

src/transformers/generation/candidate_generator.py

keyboardAnt

Thanks @jmamou! It's good news that @gauravjain14's tests pass for this PR. I added some questions and minor comments, mostly about simplifying the implementation.

src/transformers/generation/utils.py

src/transformers/generation/candidate_generator.py

keyboardAnt · 2024-12-10T19:34:11Z

src/transformers/generation/candidate_generator.py

+            if i > 0:
+                self._prev_assistant_ids = self._prev_assistant_ids[:,:-i]
+            assistant_input_ids = torch.cat([self._prev_assistant_ids, assistant_new_ids], dim=-1)      
+        assistant_input_ids = assistant_input_ids.to(torch.int)


According to the documentation, cat operates on arrays of the same type. Wdyt about ensuring that self._prev_assistant_ids and assistant_new_ids are already of torch.int type?

do you mean adding before cat

self._prev_assistant_ids = self._prev_assistant_ids.to(torch.int) assistant_new_ids = assistant_new_ids.to(torch.int)

Wdyt about ensuring we only assign torch.int to self._prev_assistant_ids and assistant_new_ids in the first place—so that we never need to cast them into torch.int?

we get all the IDs from the tokenizer and their type is int. Do you think that it is necessary to ensure they are of int type?

src/transformers/generation/candidate_generator.py

keyboardAnt

I'm somewhat puzzled by the target_vocab_size argument. 👀

src/transformers/generation/logits_process.py

src/transformers/generation/candidate_generator.py

keyboardAnt

with model microsoft/Phi-3-medium-128k-instruct
len(target_tokenizer.get_vocab()) = 32011
while config.vocab_size= 32064

Where/why do we set config.vocab_size = 32064 if we know that len(target_tokenizer.get_vocab()) = 32011?

jmamou · 2024-12-12T10:04:02Z

with model microsoft/Phi-3-medium-128k-instruct
len(target_tokenizer.get_vocab()) = 32011
while config.vocab_size= 32064

Where/why do we set config.vocab_size = 32064 if we know that len(target_tokenizer.get_vocab()) = 32011?

we don't set it.
It is part of model config
https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/blob/main/config.json#L169

I suppose that some models pad their vocabulary size for efficiency, 64 is a power of 2....

Another example Qwen/Qwen2-0.5B-Instruct

Relevant discussion https://huggingface.co/microsoft/phi-1_5/discussions/29

…zers mapping improvements

keyboardAnt

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

src/transformers/generation/logits_process.py

tests/generation/test_configuration_utils.py

src/transformers/generation/utils.py

src/transformers/generation/candidate_generator.py

jmamou · 2024-12-15T10:13:51Z

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

original implementation of SuppressTokensLogitsProcessor was buggy and not optimal. please explain your concern ...

keyboardAnt · 2024-12-15T15:11:11Z

@jmamou, thanks for the clarification about the vocabulary sizes. I’ve added a few comments. My main concern is the suggested breaking change in SuppressTokensLogitsProcessor.

original implementation of SuppressTokensLogitsProcessor was buggy and not optimal. please explain your concern ...

My concern is that such a change might cause failures for users of Hugging Face Transformers who call SuppressTokensLogitsProcessor while expecting the existing API. Changing the API would require these users to adjust their current implementations.

Another option is to extend the API of the existing class without breaking it or to create an entirely new class.

jmamou · 2024-12-15T17:48:07Z

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

jmamou · 2024-12-15T18:24:59Z

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

I opt for the second option of creating a new class.

keyboardAnt · 2024-12-15T18:48:04Z

SuppressTokensLogitsProcessor

Sorry for the misunderstanding ... I was not aware that SuppressTokensLogitsProcessor was already part of HF transformers.

I opt for the second option of creating a new class.

Sounds good. Bugs in the existing SuppressTokensLogitsProcessor will then no longer be relevant for USD and can be reported to Hugging Face or fixed in separate PRs (not urgent).

gauravjain14 · 2024-12-15T19:54:33Z

What is the expectation on the generation_mode being ASSISTED_GENERATION when speculative decoding with different tokenizers is enabled?

transformers/src/transformers/generation/utils.py

Line 2165 in 1ed1de2

if generation_mode == GenerationMode.ASSISTED_GENERATION:

When I run this script - https://gist.github.com/gauravjain14/19edce088b1f1e7b5dc9ace684e53f8d - with do_sample=True
the first call into the function generate has generation_mode=GenerationMode.ASSISTED_GENERATION but subsequent calls into the function have generation_mode=GenerationMode.GREEDY_SEARCH.

Is this expected? @jmamou, @keyboardAnt?

jmamou · 2024-12-16T08:20:34Z

What is the expectation on the generation_mode being ASSISTED_GENERATION when speculative decoding with different tokenizers is enabled?

transformers/src/transformers/generation/utils.py

Line 2165 in 1ed1de2

if generation_mode == GenerationMode.ASSISTED_GENERATION:

When I run this script - https://gist.github.com/gauravjain14/19edce088b1f1e7b5dc9ace684e53f8d - with do_sample=True the first call into the function generate has generation_mode=GenerationMode.ASSISTED_GENERATION but subsequent calls into the function have generation_mode=GenerationMode.GREEDY_SEARCH.

Is this expected? @jmamou, @keyboardAnt?

generation_mode of the target is GenerationMode.ASSISTED_GENERATION while generation_mode of the assistant model should be GenerationMode.SAMPLE (do_sample=True) or GenerationMode.GREEDY_SEARCH (do_sample=False).
That's the reason why you can get GenerationMode.ASSISTED_GENERATION for the first call to generate (self is target). But you should get GenerationMode.SAMPLE for subsequent generate calls (self is assistant) until the target generate call for validation.

keyboardAnt · 2024-12-16T23:04:35Z

@jmamou, please hit the 'Re-request Review' button when you're ready.

keyboardAnt

LGTM.

keyboardAnt · 2024-12-17T23:03:02Z

@jmamou, it seems like the changes fail the CI tests. Do they pass for you locally?

jmamou · 2024-12-18T12:49:39Z

@jmamou, it seems like the changes fail the CI tests. Do they pass for you locally?

After solving conflicts and tests, remaining failing test does not seem to be related to USD https://app.circleci.com/pipelines/github/huggingface/transformers/113897/workflows/98283892-64b7-4e14-b8a3-8f7da8f9aa61/jobs/1523429?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary

jmamou added 4 commits December 8, 2024 06:16

fix prepare + apply

6097a8d

move to cpu

71562fc

simplity suppress_tokens

3b4e9da

fix bugs and refacatoring

1dcdae4

device move

10d1e56

jmamou requested a review from keyboardAnt December 9, 2024 15:04

gauravjain14 approved these changes Dec 9, 2024

View reviewed changes

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

gauravjain14 mentioned this pull request Dec 9, 2024

[WIP] drafting a fix - cropping the kv cache #6

Closed

jmamou added 2 commits December 10, 2024 04:09

handle self.config.vocab_size > len(target_tokenizer.get_vocab())

f9a260f

no need to normalize in candidate_generator

0d3310d

keyboardAnt reviewed Dec 10, 2024

View reviewed changes

jmamou added 2 commits December 11, 2024 03:12

address Nadav's comments + minor

98cd50b

optimize device move + SuppressTokensLogitsProcessor

8260624

jmamou mentioned this pull request Dec 12, 2024

Add unittests for Universal Assisted generation #8

Merged

keyboardAnt reviewed Dec 12, 2024

View reviewed changes

src/transformers/generation/logits_process.py Outdated Show resolved Hide resolved

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

src/transformers/generation/candidate_generator.py Outdated Show resolved Hide resolved

keyboardAnt reviewed Dec 12, 2024

View reviewed changes

jmamou added 5 commits December 12, 2024 04:06

AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokeni…

ff7977e

…zers mapping improvements

padding size

38d81b1

padding improvement

6a7d3b3

fix and simplify get_target_logits

e4e53b9

renaming in get_target_logits

a19a9de

keyboardAnt requested changes Dec 13, 2024

View reviewed changes

jmamou added 2 commits December 15, 2024 03:35

minor

c4e4186

add filter_value and suppress_tokens_id

0ec0788

style + rename

200f7a0

jmamou added 7 commits December 16, 2024 04:36

remove TODO

95bfa2c

restore original SelectTokensLogitsProcessor with modification

1cbc871

fix style

4a94849

fix _update_past_and_masks and optimize code

f1b6b08

remove assistant_vocab_size arg

df68533

fix attention_mask

35e354a

call _prepare_attention_mask also if not has_past_key_values

a558bd0

jmamou added 2 commits December 17, 2024 01:47

handling attention mask for first generation

5c3ad58

comment

811a4e5

jmamou requested a review from keyboardAnt December 17, 2024 10:33

jmamou added 3 commits December 17, 2024 02:37

restore test

2dcc9ed

remove SelectTokensLogitsProcessor

f2be0da

_update_past_and_masks implementation for USD

83b8250

keyboardAnt approved these changes Dec 17, 2024

View reviewed changes

keyboardAnt merged commit 9d4d9f9 into usd Dec 17, 2024

keyboardAnt deleted the fix_prepare branch December 17, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prepare + apply #7

Fix prepare + apply #7

jmamou commented Dec 8, 2024

jmamou commented Dec 9, 2024 •

edited

Loading

jmamou commented Dec 9, 2024

gauravjain14 commented Dec 9, 2024

jmamou commented Dec 9, 2024

gauravjain14 left a comment

keyboardAnt left a comment

keyboardAnt Dec 10, 2024

jmamou Dec 11, 2024

keyboardAnt Dec 12, 2024

jmamou Dec 12, 2024

keyboardAnt left a comment •

edited

Loading

keyboardAnt left a comment

jmamou commented Dec 12, 2024 •

edited

Loading

keyboardAnt left a comment •

edited

Loading

jmamou commented Dec 15, 2024 •

edited

Loading

keyboardAnt commented Dec 15, 2024 •

edited

Loading

jmamou commented Dec 15, 2024

jmamou commented Dec 15, 2024

keyboardAnt commented Dec 15, 2024

gauravjain14 commented Dec 15, 2024

jmamou commented Dec 16, 2024

keyboardAnt commented Dec 16, 2024

keyboardAnt left a comment

keyboardAnt commented Dec 17, 2024

jmamou commented Dec 18, 2024

Fix prepare + apply #7

Fix prepare + apply #7

Conversation

jmamou commented Dec 8, 2024

jmamou commented Dec 9, 2024 • edited Loading

jmamou commented Dec 9, 2024

gauravjain14 commented Dec 9, 2024

jmamou commented Dec 9, 2024

gauravjain14 left a comment

Choose a reason for hiding this comment

keyboardAnt left a comment

Choose a reason for hiding this comment

keyboardAnt Dec 10, 2024

Choose a reason for hiding this comment

jmamou Dec 11, 2024

Choose a reason for hiding this comment

keyboardAnt Dec 12, 2024

Choose a reason for hiding this comment

jmamou Dec 12, 2024

Choose a reason for hiding this comment

keyboardAnt left a comment • edited Loading

Choose a reason for hiding this comment

keyboardAnt left a comment

Choose a reason for hiding this comment

jmamou commented Dec 12, 2024 • edited Loading

keyboardAnt left a comment • edited Loading

Choose a reason for hiding this comment

jmamou commented Dec 15, 2024 • edited Loading

keyboardAnt commented Dec 15, 2024 • edited Loading

jmamou commented Dec 15, 2024

jmamou commented Dec 15, 2024

keyboardAnt commented Dec 15, 2024

gauravjain14 commented Dec 15, 2024

jmamou commented Dec 16, 2024

keyboardAnt commented Dec 16, 2024

keyboardAnt left a comment

Choose a reason for hiding this comment

keyboardAnt commented Dec 17, 2024

jmamou commented Dec 18, 2024

jmamou commented Dec 9, 2024 •

edited

Loading

keyboardAnt left a comment •

edited

Loading

jmamou commented Dec 12, 2024 •

edited

Loading

keyboardAnt left a comment •

edited

Loading

jmamou commented Dec 15, 2024 •

edited

Loading

keyboardAnt commented Dec 15, 2024 •

edited

Loading