-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues regarding changes incoming from the foundation-model-stack/gptq_bigcode PR branch #41
Comments
With hard-coded number of cache blocks, I could get around the first problem and another issue occurred:
With some debug prints, I saw that, for some reason, the layers are producing fp32 outputs.
In contrast, when using
|
A third issue is that there are code changes for llama & gpt bigcode models in fms, but those are not applied to paged gpt bigcode in fms-extras. Especially the tied weights. I assume similar changes are needed. |
In function
get_max_gpu_blocks_available
The debugger shows that when peak_memory usage is larger than the default 0.8, the negative number computed eventually results in the function returning zero. I think at this point an error should be raised, since the code can't possibly work with 0 blocks in the paged cache. Adding an exception prevents mysterious queue empty error that will eventually happen later when trying to acquire blocks.
Here's the failed example
The text was updated successfully, but these errors were encountered: