Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use 3k context size with Llama2? #167

Closed
vadi2 opened this issue Jul 19, 2023 · 5 comments
Closed

How to use 3k context size with Llama2? #167

vadi2 opened this issue Jul 19, 2023 · 5 comments

Comments

@vadi2
Copy link

vadi2 commented Jul 19, 2023

How can one use 3k context size with Llama2? Trying out such a size makes the python process hog one core indefinitely.

@turboderp
Copy link
Owner

What's your setup and what front-end are you using? On ExLlama's web UI you'd just add -l 3000 on the command line. Llama2 supports up to 4096 tokens natively so you won't need to do anything else. Beyond that size you'll need to specify a higher --alpha value.

@vadi2
Copy link
Author

vadi2 commented Jul 19, 2023

Thanks for the quick response! I'm Ubuntu 22.04, 32gb ram, RTX 4080 with 16gb vram - using ExLlama's webui is where it goes wrong. Seems to work fine in text-gen-ui using exllama backend.

Here's a screencast -

Screencast.from.19-07-23.18.20.09.webm

Notice the python process is at 8% - and it'll be like that for a while without progress. Ignore the gjs process, I think that's gnome-shell recording taking its toll.

@turboderp
Copy link
Owner

It seems to be an oversight in how it truncates the context. My guess is that block of text ends up being longer than the total context length, so it gets stuck in a loop where it has to trim the whole block from the context, but it also won't trim the very last item.

I'm not really sure how to address it, though. I guess it could fail more gracefully, but there really isn't any underlying support for working with partial text blocks, so it would be a bit of a rewrite to enable that, and I'm not sure what good it would do since you'd get an incorrect response to the prompt in any case, if it gets truncated.

@turboderp
Copy link
Owner

Well, I added that anyway. If you try it again it should work, unless you're encountering some other bug.

@vadi2
Copy link
Author

vadi2 commented Jul 20, 2023

Thanks, can confirm python process freezing is gone now and the output acts like it's been cut off. I've added a PR to give some sort of a warning that truncation is happening, what do you think? #169

Closing issue as problem is fixed

@vadi2 vadi2 closed this as completed Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants