-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gibbish Output from 4bit EXL2 quantization #15
Comments
Similar issue when using the example inference. |
I'm not sure what's going on here. The perplexity after the measurement step is way too high, and it's essentially produced by the full-precision model on a small sample of the calibration data. I can't imagine that would fail like this without Torch being completely broken in general on your system. (To be clear, I'm assuming Torch is not completely broken so it's something else.) Which suggests there's a problem with loading the calibration data. And I've never seen that deprecation warning before, which is produced right where the text is read from the Parquet file. If somehow what it gets from the file is garbled, that might explain it. Which split did you use (test/train/val)? And what exact versions of |
I used the 0000.parquet from the training split. It's 6mb in size. |
Just checked if my pytorch being broken or not: Exllama 1 with gptq model: works fine Both running in the same conda environment.
But then if I just click reload, it will load in just a few seconds without any error. Werid. Consider I put my model file in the HDD and it should usually take quite a while. |
I'm starting to think it might be ROCm related after all. I tried switching to all the same library versions as you, and using the same dataset, I was able to get the deprecation warning. It seems they've made some changes to pandas recently, and I've updated the code accordingly to get rid of the warning, but it was still correctly loading and tokenizing the data, regardless. Then I would assume there's something unusual about the model you were converting, but given that it's also just failing to do inference on GPTQ models, it must be a ROCm-related issue after all. As for the second error... well, it might load the model very quickly a second time if it gets cached by the OS first time around, so that's not too strange on its own. But the permission error is weird. I've never seen that before. But it looks like more evidence that there's something about the new extension that isn't playing well with ROCm. Maybe @ardfork or someone else with more ROCm experience has seen sort of error before? I guess one thing always worth trying is deleting the extension cache from ~/.cache/torch_extensions. |
This error happen on 0.0.0 (before my patch), because the extra_include_paths was not correctly set, it tries to add all files in your computer to be hipified, it only stops because it doesn't have permission on some file.
As for why it works after I do not have the answer, probably missing some information or some weird ooba behaviors.
I do not have the time, the will or GPU power to try to reproduce this issue. But since inference and perplexity works correctly, there shouldn't be a reason for quantization to fail. |
Aren't you using the same GPU as #33? Issue might be the same. |
Yeah... very likely the same issue. |
Closing this as it appears to be stale. If there are still issues on ROCm, please reopen this or submit a new issue. |
Hi there,
After a few hours of waiting, I successfully quantized llama2-13b base model into EXL2 quantization. With an average of 4 bits per weight.
However, as I tries to inference using the webui, I encountered this:
I am using Radeon VII GPU. (AMD GPU, with Rocm 0.6.0)
Here is the terminal output during quantization:
quant outputs.txt
Here is the job.json and measurement.json file in the output folder after quantization.
convert_output.zip
The calibration data file used were wikitext-2-v1
The text was updated successfully, but these errors were encountered: