-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring AssistedCandidateGenerator
for Improved Modularity and Reusability
#35009
Conversation
@zucchini-nlp feel free to review it :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making the code more composable! LGTM as nothing changed in terms of functionality. Just left one comment as we had several PRs in parallel modifying the assisted generation code :)
a120dbf
to
d4ff091
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keyboardAnt ! LGTM as nothing was changed except for composing code into smaller functions.
The only question is the use of self.prev_target_ids
which I've mentioned was removed in prev PR explaining it is not needed. This is what I got from reviewing the prev PR
hmm, so to make sure, that means the prev impl when we checked prev_target_ids was not really correct? And we should check the length of already accepted input_ids
yes
So do we need to save prev token ids from target model or we can re-use the current token ids, because the current token ids in any case will have the prev token ids as prefix with new accepted tokens appended at the end
For UAG, we need just the number of tokens in |
Ah that makes sense if we don't care about the actual token ids used previously, because the tokens should be available without storing them. Then we can indeed store only the prev length |
4745dee
to
fcd129f
Compare
Thanks, @zucchini-nlp. I've replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyboardAnt
please apply the same fix as at keyboardAnt#4.
else SD will not work when target and assistant are not on the same device
@keyboardAnt |
You can now request review from the core maintainer to merge this :) |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for the detailed explanations 🤗
Merging!
Ah can you resolve conflicts? 🤗 |
@ArthurZucker done! |
@ArthurZucker @zucchini-nlp |
thanks, merging |
What does this PR do?
This PR refactors the
AssistedCandidateGenerator
andAssistedCandidateGeneratorDifferentTokenizers
classes to improve code readability, maintainability, and reusability. By breaking down large methods into smaller helper functions and reusing common logic, we enhance the modularity of these classes. This refactoring lays a cleaner foundation for upcoming features likeUniversalSpeculativeDecodingGenerator
without introducing any new functionality.Background
While working on Universal Speculative Decoding (#34760), which introduces the
UniversalSpeculativeDecodingGenerator
, we identified opportunities to refactor existing code. The goal is to reuse core logic across different candidate generators and simplify the integration of new features that enable speculative decoding across models with different tokenizers.By submitting this refactoring as a separate PR, we aim to:
This refactor is a collaboration with @jmamou, who has already reviewed it (keyboardAnt#1).
Key Changes
1. Code Restructuring
get_candidates
methods in both classes into smaller, focused helper functions._calculate_new_tokens
_update_past_and_masks
_prepare_generation_args
_generate_candidates
__init__
methods to remove redundancy and enhance clarity.2. Improved Reusability
AssistedCandidateGeneratorDifferentTokenizers
, methods like_prepare_assistant_input_ids
and_process_assistant_outputs
handle tokenizer-specific logic.3. Enhanced Readability
Motivation
This refactoring is motivated by the need to:
By isolating these changes, we enable reviewers to focus solely on the structural improvements without the added complexity of new features. This approach helps maintain a high code quality standard and simplifies the review and merging process.
Dependencies
Before Submitting
CandidateGenerator
]] #34760).Who Can Review?
The following reviewers are well-suited to review this PR: @gante, @ArthurZucker
This PR aims to strengthen the foundation for speculative decoding and other future enhancements by improving the existing code's structure and maintainability. We appreciate your time and look forward to your feedback.