Allow for arguments to be passed to the base model #216

havietisov · 2023-08-13T13:46:06Z

looking at auto-gptq code I spotted that it already contains kv-caching :
https://github.com/PanQiWei/AutoGPTQ/blob/main/auto_gptq/nn_modules/fused_llama_attn.py

would be really nice to let users utilize this.

rlouf · 2023-08-13T15:23:32Z

I agree. #190 tracks the discussion around KV caching. We have bigger plans for this, but we've been focusing on constrained generation almost exclusively so far 🙂

We can certainly implement the naive version quickly, it's just a small interface change. Will bump it up on the priority list.

havietisov · 2023-08-13T18:30:32Z

I could borrow the mechanism from huggingface transformers if not for #209, which makes the process of installing from source a bit sketchy.

rlouf · 2023-08-13T20:59:18Z

I'm going to investigate this one more tomorrow, may have some follow up questions.

rlouf added text Linked to text generation enhancement transformers Linked to the `transformers` integration labels Aug 13, 2023

rlouf mentioned this issue Nov 15, 2023

Refactor the sequence generation #366

Merged

27 tasks

rlouf closed this as completed in #366 Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for arguments to be passed to the base model #216

Allow for arguments to be passed to the base model #216

havietisov commented Aug 13, 2023

rlouf commented Aug 13, 2023

havietisov commented Aug 13, 2023

rlouf commented Aug 13, 2023

Allow for arguments to be passed to the base model #216

Allow for arguments to be passed to the base model #216

Comments

havietisov commented Aug 13, 2023

rlouf commented Aug 13, 2023

havietisov commented Aug 13, 2023

rlouf commented Aug 13, 2023