How to use 3k context size with Llama2? #167

vadi2 · 2023-07-19T15:48:12Z

How can one use 3k context size with Llama2? Trying out such a size makes the python process hog one core indefinitely.

turboderp · 2023-07-19T16:08:13Z

What's your setup and what front-end are you using? On ExLlama's web UI you'd just add -l 3000 on the command line. Llama2 supports up to 4096 tokens natively so you won't need to do anything else. Beyond that size you'll need to specify a higher --alpha value.

vadi2 · 2023-07-19T16:22:00Z

Thanks for the quick response! I'm Ubuntu 22.04, 32gb ram, RTX 4080 with 16gb vram - using ExLlama's webui is where it goes wrong. Seems to work fine in text-gen-ui using exllama backend.

Here's a screencast -

Screencast.from.19-07-23.18.20.09.webm

Notice the python process is at 8% - and it'll be like that for a while without progress. Ignore the gjs process, I think that's gnome-shell recording taking its toll.

turboderp · 2023-07-19T17:32:14Z

It seems to be an oversight in how it truncates the context. My guess is that block of text ends up being longer than the total context length, so it gets stuck in a loop where it has to trim the whole block from the context, but it also won't trim the very last item.

I'm not really sure how to address it, though. I guess it could fail more gracefully, but there really isn't any underlying support for working with partial text blocks, so it would be a bit of a rewrite to enable that, and I'm not sure what good it would do since you'd get an incorrect response to the prompt in any case, if it gets truncated.

turboderp · 2023-07-19T18:40:10Z

Well, I added that anyway. If you try it again it should work, unless you're encountering some other bug.

vadi2 · 2023-07-20T04:38:05Z

Thanks, can confirm python process freezing is gone now and the output acts like it's been cut off. I've added a PR to give some sort of a warning that truncation is happening, what do you think? #169

Closing issue as problem is fixed

vadi2 closed this as completed Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use 3k context size with Llama2? #167

How to use 3k context size with Llama2? #167

vadi2 commented Jul 19, 2023

turboderp commented Jul 19, 2023

vadi2 commented Jul 19, 2023 •

edited

Loading

turboderp commented Jul 19, 2023

turboderp commented Jul 19, 2023

vadi2 commented Jul 20, 2023

How to use 3k context size with Llama2? #167

How to use 3k context size with Llama2? #167

Comments

vadi2 commented Jul 19, 2023

turboderp commented Jul 19, 2023

vadi2 commented Jul 19, 2023 • edited Loading

turboderp commented Jul 19, 2023

turboderp commented Jul 19, 2023

vadi2 commented Jul 20, 2023

vadi2 commented Jul 19, 2023 •

edited

Loading