-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-apply chat template in SequenceGenerator
and SequenceGeneratorAdapter
, if available
#1019
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for implementing this!
There's some things I'd like to see before this is ready for merge, could you please address the following
- Test cases which ensure application of the chat template works properly for a standard case, for a missing chat template
- Consider that some users of Outlines are relying on the current behavior and make
apply_chat_template=False
the default. Perhaps we can warn the user as well if they're not setting the parameter. - Add
apply_chat_template
explanation to the docs - Use
apply_chat_template=True
in allmodels.transformers
generator examples
from outlines.models.transformers import TransformerTokenizer | ||
|
||
if isinstance(prompts, str): | ||
prompts = [prompts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the signature should be List[str]
-> List[str]
and raise an error if a list isn't passed. In transformers.py
, this function is called after the prompts are normalized to 2D anyways.
) | ||
return prompts | ||
tokenizer: "TransformerTokenizer" = model.tokenizer | ||
if getattr(tokenizer.tokenizer, "chat_template", None) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use default_chat_template
instead?
https://huggingface.co/docs/transformers/v4.34.0/chat_templating#how-do-chat-templates-work
**model_specific_params, | ||
): | ||
"""Return a text generator from a prompt or a list of prompts.""" | ||
if apply_chat_template is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this pythonic, we should have one obvious way of applying a chat template. IMO the argument should only be accepted in the constructor.
) | ||
|
||
def apply_chat_template(self, prompt: str) -> str: | ||
messages = self.get_messages(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have this be part of class Tokenizer
and raise NotImplementedError
by default.
I don't think this design is coherent with the rest of the library; we want to avoid kwargs as much as we possibly can. Here I would simply add a |
How do you guys think should the interface look like? Here, I mirrored Huggingface's Pipeline interface where we can specify configs/args in the constructor and (optionally) override them in the model call. I like it cuz it's more flexible. But yah, it does make things a bit more complicated and less pythonic. I did a quick look at other libraries, and it seems that they either (1) don't auto-apply the chat templates at all or (2) have a separate code path for base & instruct/chat models (e.g. TGI & MixEval). I think there are two reasons why:
So yah, for now, we need a way to somehow let the library know that whether we're dealing with a base model or an instruct/chat model. Worst case is we might also need to ask the user to specify the system prompt. But if we're gonna force them to go all that trouble anyway, we might as well not do this by default. I think a good compromise is to (1) not apply the chat templates by default but (2) warn the user if the chat template is specified in the tokenizer config but is not being used. |
answer = generator(
model.apply_chat_template(prompt)
) It would substantially simplify the code in this PR as well. |
I like this pull request that adds the possibility to use the tokenizer's |
Indeed, we are not going to make it the default behavior. Users should be able to inspect the modified prompt before they pass it to the generator. |
@leloykun are you still working on this? |
@rlouf yup! I just deprioritized this in favor of other stuff; I'll get back to this soon btw, thanks for the feedback, @alonsosilvaallende! |
So can this feature still be added? A chat template is really needed in many applications. |
Yes, if you'd like to introduce it in a PR, I'll review and support its inclusion provided it follows the behavior discussed in this thread! :) |
This PR auto-applies chat templates by default when using instruct/chat models. Doesn't support LlamaCPP for now tho.
Why?
Instruct/Chat models tend to be annoyingly template dependent (i.e. they perform worse if the prompts don't follow the chat template used in the finetuning step). And the more and longer they are finetuned, the worse the problem gets. Hence this PR.
Also see this issue #987 raised by @lapp0
Interface changes
This PR has minimal impact on the interface. It just changes the default behavior of the generation step.
However, this feature can be disabled either on the creation of the generator:
Or when calling the generator