Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for arguments to be passed to the base model #216

Closed
havietisov opened this issue Aug 13, 2023 · 3 comments · Fixed by #366
Closed

Allow for arguments to be passed to the base model #216

havietisov opened this issue Aug 13, 2023 · 3 comments · Fixed by #366
Labels
enhancement text Linked to text generation transformers Linked to the `transformers` integration

Comments

@havietisov
Copy link

looking at auto-gptq code I spotted that it already contains kv-caching :
https://github.com/PanQiWei/AutoGPTQ/blob/main/auto_gptq/nn_modules/fused_llama_attn.py

would be really nice to let users utilize this.

@rlouf
Copy link
Member

rlouf commented Aug 13, 2023

I agree. #190 tracks the discussion around KV caching. We have bigger plans for this, but we've been focusing on constrained generation almost exclusively so far 🙂

We can certainly implement the naive version quickly, it's just a small interface change. Will bump it up on the priority list.

@havietisov
Copy link
Author

I could borrow the mechanism from huggingface transformers if not for #209, which makes the process of installing from source a bit sketchy.

@rlouf
Copy link
Member

rlouf commented Aug 13, 2023

I'm going to investigate this one more tomorrow, may have some follow up questions.

@rlouf rlouf added text Linked to text generation enhancement transformers Linked to the `transformers` integration labels Aug 13, 2023
@rlouf rlouf closed this as completed in #366 Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement text Linked to text generation transformers Linked to the `transformers` integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants