-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use 3k context size with Llama2? #167
Comments
What's your setup and what front-end are you using? On ExLlama's web UI you'd just add |
Thanks for the quick response! I'm Ubuntu 22.04, 32gb ram, RTX 4080 with 16gb vram - using ExLlama's webui is where it goes wrong. Seems to work fine in text-gen-ui using exllama backend. Here's a screencast - Screencast.from.19-07-23.18.20.09.webmNotice the |
It seems to be an oversight in how it truncates the context. My guess is that block of text ends up being longer than the total context length, so it gets stuck in a loop where it has to trim the whole block from the context, but it also won't trim the very last item. I'm not really sure how to address it, though. I guess it could fail more gracefully, but there really isn't any underlying support for working with partial text blocks, so it would be a bit of a rewrite to enable that, and I'm not sure what good it would do since you'd get an incorrect response to the prompt in any case, if it gets truncated. |
Well, I added that anyway. If you try it again it should work, unless you're encountering some other bug. |
Thanks, can confirm python process freezing is gone now and the output acts like it's been cut off. I've added a PR to give some sort of a warning that truncation is happening, what do you think? #169 Closing issue as problem is fixed |
How can one use 3k context size with Llama2? Trying out such a size makes the python process hog one core indefinitely.
The text was updated successfully, but these errors were encountered: